WormBase-Caltech Weekly Calls November 2012

From WormBaseWiki
Jump to navigationJump to search

November 1, 2012

SPELL data

  • Including new RNA-seq datasets
  • Comparing different RNA-seq datasets


Dedicated ~2 weeks for issues

  • Dedicated time during release cycle to fix bugs and address GitHub issues


Intermine

  • JD Wong started at WormBase this week; will focus on building WormBase instance of Intermine
  • Developed FlyBase Intermine instance
  • Once in place, we can build pre-canned queries that we can embed in WormBase pages (maybe in response to help desk inquiries)
    • E.g. Display meta information for a gene; "How many genes show this phenotype?"
  • We have resources provisioned for building and hosting
  • Low hanging fruit: access to ontologies, gene models, phenotypes, homology, orthology (things already made for Intermine)
  • Focus first on central dogma stuff


SAB Meeting

  • January 28th, 29th, 2013
  • At Caltech


Concise description template

  • Curator version versus a community (simpler) version
  • 1) Transfer info from elegans to other species
  • 2) Updates
  • Easier to script when there is little information; harder when lots of info


Pathways

  • Karen spoke to Alex Pico
  • There could be tagged for "WormBase-approved" pathways
  • Would be good to have worm community also contribute; "Community Pathways" widget
  • Conflict resolution? There could be many contributions to the same pathway
  • Is there a way to pull up conflicting versions of a pathway?
  • Open system: any registered user can modify any pathway


Community Annotation

  • Possibilities?
  • Concise descriptions, pathways, what else?
  • Expression patterns?
  • Tables/templates of suggested data/types that authors could/should submit for a paper
  • Annotation Apps? iPhone, Android, etc.
    • Controlled vocabulary, virtual worm, AMIGO ontology browser, etc.
  • Annotation on tablets, other mobile devices? Read only?
  • Web app versus native app
  • Open Badges for micro-attribution, incentives
  • ORCID for unique person IDs
  • Possible Ontology Annotator version for community?
  • Templates/forms would need to be very straight-forward and simple to use
  • Maybe display a form alongside web display to inform users of how the two relate
  • Training users: tutorials, workshops (2013 IWM?)
  • Daily updates to website could be more satisfying for users


November 8, 2012

Construction

  • Burning smell frequently
  • Now cigarette smoke smell (maybe completely unrelated to construction, of course)


Curation Status Form

  • http://mangolassi.caltech.edu/~postgres/cgi-bin/curation_status.cgi
  • Curators cannot indicate both positive and negative for a data type per paper; conflict must be resolved by curators
  • One free text comment field (and one "pre-canned" comment field) per paper per data type
  • To what extent would a curator like to see all data for several papers at once? Likely only need to see one paper and one data type at a time
  • The form provides a (unique) way for curators to "validate" a paper as "negative" for a data type
  • It is important to decouple validation from curation, so we can capture both
  • Ideal if each paper could be independently categorized according to flagging status, validation status, and curation status
  • Papers can be identified as "Not Validated", "Validated Negative", "Validate Positive - Not Curated", or "Validate Positive - Curated"
  • We will make adjustments to form to accomodate


Dead Genes

  • How do we/should we handle objects (e.g. interactions) that now refer to dead genes?
  • Would be best if we could keep the original gene reference with a downstream automated process (at dump, ACE build, webdisplay???) to convert to new IDs (if there are merges) or a remark indicating a dead gene
  • Will discuss more next week


November 15, 2012

Scripts from Mary Ann

  • Script for recognizing alleles for Textpresso
  • Mary Ann sent to James and Raymond
  • James has been working on it


AQL/WQL Queries

  • HTML vs plain text output
  • HTML output takes much longer than plain text and often times out
  • Not optimal to give users AQL queries; will be better once Intermine is up and running
  • User data requests are use-cases for Intermine
  • Problem with web browsers timing out before AQL query can finish: this is a browser issue


Curation Statistics Tool

  • Do curators want to see unions or intersections of specific flagging methods? Yes
  • Will need to develop a visualization aside from the summary table
  • Chris and Juancarlos will discuss offline


Human Disease tags for ?Gene model

  • Using common vocabulary
  • Human_disease_info tag with subset tags : OMIM_ortholog, Potential_model_for, Experimental_model_for, and Human_disease_relevance
?Gene
        Human_disease_info     OMIM_ortholog             ?Accession_number      #Evidence
                               Potential_model_for       ?DO_term      XREF  Gene    #Evidence
                               Experimental_model_for    ?DO_term      XREF  Gene    #Evidence
                               Human_disease_relevance   ?Text       #Evidence
  • Change "?Accession_number" above to full database reference "?Database, ?Database_field, ?Accession_number"
  • And add tags to #Evidence hash?
?Evidence
            Inferred_from_sequence_orthology     ?Accession_number
            Inferred_from_mutant_phenotype       ?Variation
            Inferred_from_genetic_interaction    ?Accession_number
  • Evidence_code tag instead of using #Evidence hash?


Automated, template-based draft concise descriptions

  • Kimberly and Ranjana had been working on a form before grant writing
  • Idea is to use prompts to guide curators/community to write concise description sentences
  • Have tested on some complex examples
  • Once form is developed will send out to community
  • If easy enough to use, we could make a student assignment for credit? - Gary S


November 30, 2012

Curation Status Form

  • Should we stick with "concatenated" SVM or "document" SVM?
    • General trend is that "concatenated" SVM performs better
    • We will go with "concatendated" SVM
  • Once we clean up the code and document, we can move to Tazendra
  • Curators can start tracking curation and flagging in earnest once we've switched to Tazendra
  • Textpresso pipelines for curation and flagging?
  • We can integrate other pipelines later
  • Need to think about how to incorporate Mary Ann's curation pipeline


Anatomy Data Types in Author/Curation First Pass Forms

  • Adapt to handle anatomy data types that Raymond actually curates
  • Best to get rid of (or hide?) original 4 anatomy data types and replace with (5?) new ones
  • Not needed to add new types to the Curation Status Form


Next data upload will be December 20th


Model Changes for Parasites

  • Mary Ann speaking with parasite nematode labs
  • Need to capture drug resistance/susceptibility in Strains or Phenotypes
  • How to change data models to accommodate
  • Put everything in the Remark field (of #Phenotype_info of Phenotype wrt Strain)?
  • What are the use cases for this data? Are we to act as basic/core repository for this info?
  • What info should be indexed/modeled in tags? What would be queried, filtered, sorted?
  • Phenotypes may/will need to account for host-parasite interactions; capture experimental model host? Natural/isolation host?
  • One-time-use strains, or propagate-able?


AMIGO 2 Ontology Browser

  • Raymond has AMIGO 2 running on our machines
  • Not currently running off of our (worm) database/ontologies
  • Currently troubleshooting installation/database problems
  • AMIGO 2 reads OWL files; Need convert OBO to OWL? Not necessary
  • Is the problem with the structure of the ontology(ies)? Test with phenotype ontology?
  • Apache SOLR used for AMIGO; need to understand SOLR; then, may not need AMIGO