Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
m
Line 14: Line 14:
 
[[WormBase-Caltech_Weekly_Calls_March_2012|March]]
 
[[WormBase-Caltech_Weekly_Calls_March_2012|March]]
  
 
+
[[WormBase-Caltech_Weekly_Calls_April_2012|April]]
 
 
 
 
 
 
== April 12, 2012 ==
 
 
 
RNAi OA
 
*OA almost ready to go live
 
*Testing now with test curation
 
*Should go live next week for official curation
 
 
 
 
 
New Website
 
*Most problems are being fixed in a timely manner
 
*Curators can now edit links and add custom widgets
 
*Issues (tracked on GitHub) being dealt with quickly
 
 
 
 
 
BioCurator Meeting
 
*Good meeting, bigger than before
 
*Common themes: data standards, how to educate users of database materials and how to use it (and think critically)
 
*How can MODs work better with journals and PubMed to solve the 'triage' problem?
 
**Streamlining the paper acquisition/curation process
 
**MODs should ask NLM to take the burden of retrieving PDFs
 
**Get lawyers involved to make available?
 
**Publishers tend to be lax on text mining rules, maybe will evolve into an easier process
 
*Maybe write a grant for research project as a proof-of-principle that triage can be done in an effective/efficient manner
 
*May ask ISB (Int Society Biocurators) for help with this
 
*Sequence and protein curation: tools, databases (topic-specific; pathways, cancer, etc.)
 
*GeneWiki for human gene annotation
 
**One page for each gene; already have ~10,000 articles
 
**~Dozen editors, credibility of authors checked (?)
 
**Reasonably satisfied with coverage of human disease genes
 
*Whole-genome sequencing of individuals
 
**Newly identified genetic disorder
 
**VAST instead of BLAST
 
*Tool to identify primers from papers and map them to the genome automatically
 
*Intermine discussed
 
**Comparable to WormMart
 
**Object-oriented database
 
**Performs similar to WormBase
 
**Many pre-canned queries
 
**Advanced search Query-builder available
 
**MODs switched over to Intermine from BioMart
 
**WormMart - Will Spooner tried to provide queries that are more natural
 
**We can work to build an interface on top of Intermine, etc.
 
**Todd has made progress with getting Intermine for WormBase
 
*Lot's of specialized talks, reduced the productivity (compared to BioCreative meeting)
 
*Curators explaining their curation pipeline
 
*Textpresso still popular ;)
 
**Six out of seven MODs using Textpresso
 
**Discussed text mining in particular applications (eg. CCC)
 
**Textpresso only tool using full-text for mining
 
**Pete from FlyBase: SVM results are deteriorating (similar to WormBase)
 
***Start training from scratch; hopefully get better recall/precision numbers
 
*Natural language processing on figure legends/captions
 
**Tries to find text in the body that relates to figure
 
**Possible collaboration with Texpresso
 
*NLP research group in Germany
 
**'Actor', 'agent' etc. and relationships (RDF triplets)
 
*Doug Howe (ZFIN), zebrafish corpus small enough, doesn't need Textpresso
 
*Julio Collado-Vides, Textpresso for E. coli fell apart, but trying to get back together
 
 
 
 
 
Paul will meet someone from Elsevier
 
*Image curation/ rights issues
 
 
 
 
 
Genetic Interaction ontology
 
*SGD on board with ontology so far; performing trial curation
 
*FlyBase interested in using as well; will meet with Chris and Rose in May to discuss
 
 
 
 
 
 
 
== April 19, 2012 ==
 
 
 
Interaction object displays on WormBase website
 
*Chris and Maher will sort out on GitHub
 
*Chris will map data from old tags to new tags and suggest display changes for new data types where necessary
 
*One issue to deal with is the complex objects with multiple Interaction_types (and intended to be separate objects)
 
 
 
 
 
Interaction model and intragenic suppression
 
*We need to make some modifications to the new Interaction model if we want to accommodate intragenic suppression (or other intragenic) events
 
*Proposed change is to:
 
#Make each allele a separate object
 
#Move the Variation (and Transgene) tag out of the Interactor_info hash and into the main Interaction model under the Interactor tag
 
#Add a Cis_intragenic_suppression and a Trans_intragenic_suppression tag under the Interaction_type tag (perhaps also intragenic_enhancement?)
 
*With these changes:
 
**Each variation (and transgene) can be listed as an interactor with Interactor_info indicating Affected, Effector, or Non_directional
 
**Genes associated with intragenic, interacting variations will display (in Cytoscape view) as interacting with themselves via a Genetic Interaction
 
**Mary Ann can then indicate/curate the flanking sequences for each allele
 
 
 
 
 
Life_stage objects still dump as names, not IDs
 
*This is because ACEDB only handles names, not IDs
 
*Daniela is in charge of this class; we can discuss with her when she's back
 
*We likely want to change to a system where we use only IDs in .ACE objects
 
 
 
 
 
URL Constructors for GSA markup
 
*Todd has taken care of much of the issue of URL construction for GSA marked-up papers
 
*Karen will send Todd examples of Anatomy_term/Anatomy_name links that need to be checked
 
*GSA papers will need to be rechecked to ensure that all links are working
 
 
 
 
 
Network outages
 
*Various office network ports are non-functional as of yesterday
 
*IMSS/Network admins aware of issue and working on it
 
 
 
 
 
Interaction and Gene_regulation objects for next upload
 
*Conversion scripts will need to be run again to convert objects to new model format
 
*Chris will look into whether or not the mapping files (needed to update Gene_regulation objects) will need to be updated for the newest data
 
*Xiaodong will dump Gene_regulation objects out of the OA using the old dumping script
 
 
 
 
 
 
 
== April 26, 2012 ==
 
 
 
 
 
Meeting with Elsevier rep
 
*Elsevier getting more open to text-mining
 
*People build apps and then put them on the Science-Direct site (e.g. TAIR app)
 
*Wanted a couple sentences on what we want from text-mining
 
*GO consortium would like text-mining for triage of new papers
 
*'Climate is better now'
 
 
 
 
 
Yeast-two-hybrid data issues
 
*Lots of redundancies, bogus objects, many objects per bait/target (Sequences, CDSs, genes, etc.)
 
*Provenance of data isn't clear
 
*Should mv PCR products be mapped each build to genes?
 
*May want to start from scratch and collect YH data from Vidal and Walhout labs
 
*Check if BioGrid is curating this data already
 
 
 
 
 
Next WormBase grant due in 6 months
 
*30 pages
 
*Need to figure out what we want to do in next 5 years; how we want to organize
 
*Combine SAB meeting and grant writing?
 
*New page types lagged behind due to updating of web site: e.g. Process pages
 
*What is reasonable/realistic for what new content can get online?
 
 
 
 
 
Curation wish-list on Wiki (Ranjana)
 
*Many papers on new topics coming out
 
*Drug-screening, drug interaction
 
*Infection, parasitism
 
 
 
 
 
Anatomy links from Worm Atlas broken
 
*Links need to be fixed/cleaned up
 
*Going forward, may need some sort of DOI system (stable links indefinitely)
 
*An issue of GSA markup as well
 
*Published links will never change; will need to accommodate
 
 
 
 
 
Ontology searches
 
*Trying to adapt AMIGO to use our .OBO files
 
*National Center for Biomedical Ontologies uses Protege instead of OBO Edit
 
*Consider adopting Protege/OWL files? Conversion could be trivial
 
*Parent-child relationships file for C. elegans cell lineage; need to accommodate indeterminacy
 
*Use synonym assignment to handle different possible outcomes/identities?
 
 
 
 
 
Elbrus has reached capacity limit
 
*Broke RNAi curation pipeline
 
*Useful bits of code on elbrus
 
**Data submission forms (RNAi data)
 
**Microarray query tool (broken/toss)
 
*Should put (working) code on GitHub repository
 
 
 
 
 
User datamining demands
 
*We need to accommodate users requests for data
 
*Fix WormMart/incorporate Intermine
 
*Bring back Batch Gene query
 
*Custom query building (by curators) based on user requests?
 
*Look at help desk e-mails and determine what users want
 
*Pre-canned queries?
 
*AcePerl scripts could perform batch gene queries
 
  
  

Revision as of 17:21, 3 May 2012

2009 Meetings

2011 Meetings


2012 Meetings

January

February

March

April


May 3, 2012

Curator Timestamps

  • Determining what data was provided directly by curator vs. what was populated automatically (e.g. mapping scripts)
  • Older data provided by curators that are no longer here will be problematic
  • We should archive all data-processing scripts in GitHub
  • Scripts can be made to create a unique timestamp that identifies that script after the fact


Interaction model change

  • Pulling Variation, Transgene, Antibody, and Expr_pattern tags out of the Interactor_info hash and into the main model
  • This was originally to be able to capture intragenic interactions
  • The problem is the inherent disconnect between an interactor entity (e.g. gene) and these objects
  • Making this change would force a post-curation mechanism that ties these entities together for intuitive data display
    • Such a linking mechanism may be error prone, faulty, and potentially a headache for the web team
  • Is there a better way to handle this type of data?
  • Chris will discuss with Todd to see how much of a problem this would pose to the web team