Difference between revisions of "WormBase-Caltech Weekly Calls"

Revision as of 17:18, 3 May 2012

2009 Meetings

2011 Meetings

2012 Meetings

January

February

March

April 12, 2012

RNAi OA

OA almost ready to go live
Testing now with test curation
Should go live next week for official curation

New Website

Most problems are being fixed in a timely manner
Curators can now edit links and add custom widgets
Issues (tracked on GitHub) being dealt with quickly

BioCurator Meeting

Good meeting, bigger than before
Common themes: data standards, how to educate users of database materials and how to use it (and think critically)
How can MODs work better with journals and PubMed to solve the 'triage' problem?
- Streamlining the paper acquisition/curation process
- MODs should ask NLM to take the burden of retrieving PDFs
- Get lawyers involved to make available?
- Publishers tend to be lax on text mining rules, maybe will evolve into an easier process
Maybe write a grant for research project as a proof-of-principle that triage can be done in an effective/efficient manner
May ask ISB (Int Society Biocurators) for help with this
Sequence and protein curation: tools, databases (topic-specific; pathways, cancer, etc.)
GeneWiki for human gene annotation
- One page for each gene; already have ~10,000 articles
- ~Dozen editors, credibility of authors checked (?)
- Reasonably satisfied with coverage of human disease genes
Whole-genome sequencing of individuals
- Newly identified genetic disorder
- VAST instead of BLAST
Tool to identify primers from papers and map them to the genome automatically
Intermine discussed
- Comparable to WormMart
- Object-oriented database
- Performs similar to WormBase
- Many pre-canned queries
- Advanced search Query-builder available
- MODs switched over to Intermine from BioMart
- WormMart - Will Spooner tried to provide queries that are more natural
- We can work to build an interface on top of Intermine, etc.
- Todd has made progress with getting Intermine for WormBase
Lot's of specialized talks, reduced the productivity (compared to BioCreative meeting)
Curators explaining their curation pipeline
Textpresso still popular ;)
- Six out of seven MODs using Textpresso
- Discussed text mining in particular applications (eg. CCC)
- Textpresso only tool using full-text for mining
- Pete from FlyBase: SVM results are deteriorating (similar to WormBase)
  - Start training from scratch; hopefully get better recall/precision numbers
Natural language processing on figure legends/captions
- Tries to find text in the body that relates to figure
- Possible collaboration with Texpresso
NLP research group in Germany
- 'Actor', 'agent' etc. and relationships (RDF triplets)
Doug Howe (ZFIN), zebrafish corpus small enough, doesn't need Textpresso
Julio Collado-Vides, Textpresso for E. coli fell apart, but trying to get back together

Paul will meet someone from Elsevier

Image curation/ rights issues

Genetic Interaction ontology

SGD on board with ontology so far; performing trial curation
FlyBase interested in using as well; will meet with Chris and Rose in May to discuss

April 19, 2012

Interaction object displays on WormBase website

Chris and Maher will sort out on GitHub
Chris will map data from old tags to new tags and suggest display changes for new data types where necessary
One issue to deal with is the complex objects with multiple Interaction_types (and intended to be separate objects)

Interaction model and intragenic suppression

We need to make some modifications to the new Interaction model if we want to accommodate intragenic suppression (or other intragenic) events
Proposed change is to:

Make each allele a separate object
Move the Variation (and Transgene) tag out of the Interactor_info hash and into the main Interaction model under the Interactor tag
Add a Cis_intragenic_suppression and a Trans_intragenic_suppression tag under the Interaction_type tag (perhaps also intragenic_enhancement?)

With these changes:
- Each variation (and transgene) can be listed as an interactor with Interactor_info indicating Affected, Effector, or Non_directional
- Genes associated with intragenic, interacting variations will display (in Cytoscape view) as interacting with themselves via a Genetic Interaction
- Mary Ann can then indicate/curate the flanking sequences for each allele

Life_stage objects still dump as names, not IDs

This is because ACEDB only handles names, not IDs
Daniela is in charge of this class; we can discuss with her when she's back
We likely want to change to a system where we use only IDs in .ACE objects

URL Constructors for GSA markup

Todd has taken care of much of the issue of URL construction for GSA marked-up papers
Karen will send Todd examples of Anatomy_term/Anatomy_name links that need to be checked
GSA papers will need to be rechecked to ensure that all links are working

Network outages

Various office network ports are non-functional as of yesterday
IMSS/Network admins aware of issue and working on it

Interaction and Gene_regulation objects for next upload

Conversion scripts will need to be run again to convert objects to new model format
Chris will look into whether or not the mapping files (needed to update Gene_regulation objects) will need to be updated for the newest data
Xiaodong will dump Gene_regulation objects out of the OA using the old dumping script

April 26, 2012

Meeting with Elsevier rep

Elsevier getting more open to text-mining
People build apps and then put them on the Science-Direct site (e.g. TAIR app)
Wanted a couple sentences on what we want from text-mining
GO consortium would like text-mining for triage of new papers
'Climate is better now'

Yeast-two-hybrid data issues

Lots of redundancies, bogus objects, many objects per bait/target (Sequences, CDSs, genes, etc.)
Provenance of data isn't clear
Should mv PCR products be mapped each build to genes?
May want to start from scratch and collect YH data from Vidal and Walhout labs
Check if BioGrid is curating this data already

Next WormBase grant due in 6 months

30 pages
Need to figure out what we want to do in next 5 years; how we want to organize
Combine SAB meeting and grant writing?
New page types lagged behind due to updating of web site: e.g. Process pages
What is reasonable/realistic for what new content can get online?

Curation wish-list on Wiki (Ranjana)

Many papers on new topics coming out
Drug-screening, drug interaction
Infection, parasitism

Anatomy links from Worm Atlas broken

Links need to be fixed/cleaned up
Going forward, may need some sort of DOI system (stable links indefinitely)
An issue of GSA markup as well
Published links will never change; will need to accommodate

Ontology searches

Trying to adapt AMIGO to use our .OBO files
National Center for Biomedical Ontologies uses Protege instead of OBO Edit
Consider adopting Protege/OWL files? Conversion could be trivial
Parent-child relationships file for C. elegans cell lineage; need to accommodate indeterminacy
Use synonym assignment to handle different possible outcomes/identities?

Elbrus has reached capacity limit

Broke RNAi curation pipeline
Useful bits of code on elbrus
- Data submission forms (RNAi data)
- Microarray query tool (broken/toss)
Should put (working) code on GitHub repository

User datamining demands

We need to accommodate users requests for data
Fix WormMart/incorporate Intermine
Bring back Batch Gene query
Custom query building (by curators) based on user requests?
Look at help desk e-mails and determine what users want
Pre-canned queries?
AcePerl scripts could perform batch gene queries

May 3, 2012

Curator Timestamps

Determining what data was provided directly by curator vs. what was populated automatically (e.g. mapping scripts)
Older data provided by curators that are no longer here will be problematic
We should archive all data-processing scripts in GitHub
Scripts can be made to create a unique timestamp that identifies that script after the fact

Interaction model change

Pulling Variation, Transgene, Antibody, and Expr_pattern tags out of the Interactor_info hash and into the main model
This was originally to be able to capture intragenic interactions
The problem is the inherent disconnect between an interactor entity (e.g. gene) and these objects
Making this change would force a post-curation mechanism that ties these entities together for intuitive data display
- Such a linking mechanism may be error prone, faulty, and potentially a headache for the web team
Is there a better way to handle this type of data?
Chris will discuss with Todd to see how much of a problem this would pose to the web team

@@ Line 207: / Line 207: @@
 *We should archive all data-processing scripts in GitHub
 *Scripts can be made to create a unique timestamp that identifies that script after the fact
+Interaction model change
+*Pulling Variation, Transgene, Antibody, and Expr_pattern tags out of the Interactor_info hash and into the main model
+*This was originally to be able to capture intragenic interactions
+*The problem is the inherent disconnect between an interactor entity (e.g. gene) and these objects
+*Making this change would force a post-curation mechanism that ties these entities together for intuitive data display
+**Such a linking mechanism may be error prone, faulty, and potentially a headache for the web team
+*Is there a better way to handle this type of data?
+*Chris will discuss with Todd to see how much of a problem this would pose to the web team

Difference between revisions of "WormBase-Caltech Weekly Calls"

Revision as of 17:18, 3 May 2012

Contents

2012 Meetings

April 12, 2012

April 19, 2012

April 26, 2012

May 3, 2012

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools