Difference between revisions of "WormBase-Caltech Weekly Calls"
From WormBaseWiki
Jump to navigationJump to searchm |
m (→May 3, 2012) |
||
Line 207: | Line 207: | ||
*We should archive all data-processing scripts in GitHub | *We should archive all data-processing scripts in GitHub | ||
*Scripts can be made to create a unique timestamp that identifies that script after the fact | *Scripts can be made to create a unique timestamp that identifies that script after the fact | ||
+ | |||
+ | |||
+ | Interaction model change | ||
+ | *Pulling Variation, Transgene, Antibody, and Expr_pattern tags out of the Interactor_info hash and into the main model | ||
+ | *This was originally to be able to capture intragenic interactions | ||
+ | *The problem is the inherent disconnect between an interactor entity (e.g. gene) and these objects | ||
+ | *Making this change would force a post-curation mechanism that ties these entities together for intuitive data display | ||
+ | **Such a linking mechanism may be error prone, faulty, and potentially a headache for the web team | ||
+ | *Is there a better way to handle this type of data? | ||
+ | *Chris will discuss with Todd to see how much of a problem this would pose to the web team |
Revision as of 17:18, 3 May 2012
2012 Meetings
April 12, 2012
RNAi OA
- OA almost ready to go live
- Testing now with test curation
- Should go live next week for official curation
New Website
- Most problems are being fixed in a timely manner
- Curators can now edit links and add custom widgets
- Issues (tracked on GitHub) being dealt with quickly
BioCurator Meeting
- Good meeting, bigger than before
- Common themes: data standards, how to educate users of database materials and how to use it (and think critically)
- How can MODs work better with journals and PubMed to solve the 'triage' problem?
- Streamlining the paper acquisition/curation process
- MODs should ask NLM to take the burden of retrieving PDFs
- Get lawyers involved to make available?
- Publishers tend to be lax on text mining rules, maybe will evolve into an easier process
- Maybe write a grant for research project as a proof-of-principle that triage can be done in an effective/efficient manner
- May ask ISB (Int Society Biocurators) for help with this
- Sequence and protein curation: tools, databases (topic-specific; pathways, cancer, etc.)
- GeneWiki for human gene annotation
- One page for each gene; already have ~10,000 articles
- ~Dozen editors, credibility of authors checked (?)
- Reasonably satisfied with coverage of human disease genes
- Whole-genome sequencing of individuals
- Newly identified genetic disorder
- VAST instead of BLAST
- Tool to identify primers from papers and map them to the genome automatically
- Intermine discussed
- Comparable to WormMart
- Object-oriented database
- Performs similar to WormBase
- Many pre-canned queries
- Advanced search Query-builder available
- MODs switched over to Intermine from BioMart
- WormMart - Will Spooner tried to provide queries that are more natural
- We can work to build an interface on top of Intermine, etc.
- Todd has made progress with getting Intermine for WormBase
- Lot's of specialized talks, reduced the productivity (compared to BioCreative meeting)
- Curators explaining their curation pipeline
- Textpresso still popular ;)
- Six out of seven MODs using Textpresso
- Discussed text mining in particular applications (eg. CCC)
- Textpresso only tool using full-text for mining
- Pete from FlyBase: SVM results are deteriorating (similar to WormBase)
- Start training from scratch; hopefully get better recall/precision numbers
- Natural language processing on figure legends/captions
- Tries to find text in the body that relates to figure
- Possible collaboration with Texpresso
- NLP research group in Germany
- 'Actor', 'agent' etc. and relationships (RDF triplets)
- Doug Howe (ZFIN), zebrafish corpus small enough, doesn't need Textpresso
- Julio Collado-Vides, Textpresso for E. coli fell apart, but trying to get back together
Paul will meet someone from Elsevier
- Image curation/ rights issues
Genetic Interaction ontology
- SGD on board with ontology so far; performing trial curation
- FlyBase interested in using as well; will meet with Chris and Rose in May to discuss
April 19, 2012
Interaction object displays on WormBase website
- Chris and Maher will sort out on GitHub
- Chris will map data from old tags to new tags and suggest display changes for new data types where necessary
- One issue to deal with is the complex objects with multiple Interaction_types (and intended to be separate objects)
Interaction model and intragenic suppression
- We need to make some modifications to the new Interaction model if we want to accommodate intragenic suppression (or other intragenic) events
- Proposed change is to:
- Make each allele a separate object
- Move the Variation (and Transgene) tag out of the Interactor_info hash and into the main Interaction model under the Interactor tag
- Add a Cis_intragenic_suppression and a Trans_intragenic_suppression tag under the Interaction_type tag (perhaps also intragenic_enhancement?)
- With these changes:
- Each variation (and transgene) can be listed as an interactor with Interactor_info indicating Affected, Effector, or Non_directional
- Genes associated with intragenic, interacting variations will display (in Cytoscape view) as interacting with themselves via a Genetic Interaction
- Mary Ann can then indicate/curate the flanking sequences for each allele
Life_stage objects still dump as names, not IDs
- This is because ACEDB only handles names, not IDs
- Daniela is in charge of this class; we can discuss with her when she's back
- We likely want to change to a system where we use only IDs in .ACE objects
URL Constructors for GSA markup
- Todd has taken care of much of the issue of URL construction for GSA marked-up papers
- Karen will send Todd examples of Anatomy_term/Anatomy_name links that need to be checked
- GSA papers will need to be rechecked to ensure that all links are working
Network outages
- Various office network ports are non-functional as of yesterday
- IMSS/Network admins aware of issue and working on it
Interaction and Gene_regulation objects for next upload
- Conversion scripts will need to be run again to convert objects to new model format
- Chris will look into whether or not the mapping files (needed to update Gene_regulation objects) will need to be updated for the newest data
- Xiaodong will dump Gene_regulation objects out of the OA using the old dumping script
April 26, 2012
Meeting with Elsevier rep
- Elsevier getting more open to text-mining
- People build apps and then put them on the Science-Direct site (e.g. TAIR app)
- Wanted a couple sentences on what we want from text-mining
- GO consortium would like text-mining for triage of new papers
- 'Climate is better now'
Yeast-two-hybrid data issues
- Lots of redundancies, bogus objects, many objects per bait/target (Sequences, CDSs, genes, etc.)
- Provenance of data isn't clear
- Should mv PCR products be mapped each build to genes?
- May want to start from scratch and collect YH data from Vidal and Walhout labs
- Check if BioGrid is curating this data already
Next WormBase grant due in 6 months
- 30 pages
- Need to figure out what we want to do in next 5 years; how we want to organize
- Combine SAB meeting and grant writing?
- New page types lagged behind due to updating of web site: e.g. Process pages
- What is reasonable/realistic for what new content can get online?
Curation wish-list on Wiki (Ranjana)
- Many papers on new topics coming out
- Drug-screening, drug interaction
- Infection, parasitism
Anatomy links from Worm Atlas broken
- Links need to be fixed/cleaned up
- Going forward, may need some sort of DOI system (stable links indefinitely)
- An issue of GSA markup as well
- Published links will never change; will need to accommodate
Ontology searches
- Trying to adapt AMIGO to use our .OBO files
- National Center for Biomedical Ontologies uses Protege instead of OBO Edit
- Consider adopting Protege/OWL files? Conversion could be trivial
- Parent-child relationships file for C. elegans cell lineage; need to accommodate indeterminacy
- Use synonym assignment to handle different possible outcomes/identities?
Elbrus has reached capacity limit
- Broke RNAi curation pipeline
- Useful bits of code on elbrus
- Data submission forms (RNAi data)
- Microarray query tool (broken/toss)
- Should put (working) code on GitHub repository
User datamining demands
- We need to accommodate users requests for data
- Fix WormMart/incorporate Intermine
- Bring back Batch Gene query
- Custom query building (by curators) based on user requests?
- Look at help desk e-mails and determine what users want
- Pre-canned queries?
- AcePerl scripts could perform batch gene queries
May 3, 2012
Curator Timestamps
- Determining what data was provided directly by curator vs. what was populated automatically (e.g. mapping scripts)
- Older data provided by curators that are no longer here will be problematic
- We should archive all data-processing scripts in GitHub
- Scripts can be made to create a unique timestamp that identifies that script after the fact
Interaction model change
- Pulling Variation, Transgene, Antibody, and Expr_pattern tags out of the Interactor_info hash and into the main model
- This was originally to be able to capture intragenic interactions
- The problem is the inherent disconnect between an interactor entity (e.g. gene) and these objects
- Making this change would force a post-curation mechanism that ties these entities together for intuitive data display
- Such a linking mechanism may be error prone, faulty, and potentially a headache for the web team
- Is there a better way to handle this type of data?
- Chris will discuss with Todd to see how much of a problem this would pose to the web team