Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
m
m
Line 20: Line 20:
 
[[WormBase-Caltech_Weekly_Calls_September_2011|September]]
 
[[WormBase-Caltech_Weekly_Calls_September_2011|September]]
  
 +
[[WormBase-Caltech_Weekly_Calls_October_2011|October]]
  
 
== October 6, 2011 ==
 
 
SVM
 
*SVM Manuscript in process of resubmission
 
*Messy procedures in place; needs to be cleaned up
 
*Yuling able to get pipeline running from beginning to the end
 
*May take longer than previously expected to get up and running
 
*FlyBase has code running for SVM
 
*Yuling should contact FlyBase to make sure they're OK, once he has WormBase SVM working
 
*SGD would like to incorporate SVM into they're database
 
*Yuri would work on SGD SVM, ultimately
 
 
 
Physical interaction model
 
*Close to done
 
*Once model approved, work on:
 
**Converting BioGRID file to the ACE model
 
**Convert existing YH data into new model
 
 
 
Image curation
 
*Need to establish legal process for acquiring images
 
*Contact Caltech Office of the General Counsel for approval
 
 
 
WORM Paper
 
*Karen working on it
 
*Need a clearer/new plan
 
 
 
Expression pattern display
 
* Showing composite images with the option to see greater detail
 
* Separate out detail images according to Qualifier tags (Certain, Uncertain, Partial)
 
* Render oblique angles of cells/tissues to show 3D features/morphology of anatomy objects and, possibly, animations
 
* "Real" images should always take precedence to "virtual" images
 
* May talk to John Murray or Bill Mohler for confocal images of embryos
 
 
 
Worm Method (Worm Book)
 
*Any new web pages for C. elegans researchers, send to Raymond
 
*URL-finder for neuroscience (Yuling)
 
 
 
Textpresso
 
*Working on Fly Textpresso site (and others)
 
*Rose Oughtred would like Wnt Textpresso tools
 
*PDF tools (URL extractor)
 
*SVM re-working
 
*Arun working on GSA editor
 
 
 
Grants
 
* Need to think about next grant in a couple of months (winter); due in October 2012? 30 pages, short?
 
* Focus on writing papers (anatomy function, WORM, virtual worm, etc.)
 
* Focus on human-relevance
 
* Need Gantt charts and quarterly statements of progress
 
* Stats
 
** Where are we with each data type
 
** Status of website
 
** Automation
 
** What is different in the field?
 
** How is data changing?
 
 
 
== October 13, 2011 ==
 
 
WORM Publiction
 
*Looks good overall
 
*Change "library" of papers to "index"
 
*Putative deadline of November 15th
 
*Put in more details about the new website?
 
*Figures? Screenshots from new website? Gene page features?
 
*Save other (detailed) content for future papers
 
 
 
Paper pipeline
 
*Need to fix some PubMed IDs
 
*Over 900 articles in Postgres do not have a PubMed ID; Why? How to fix?
 
*Could some aspects of the pipeline be accomplished by a non-PhD?
 
*Supplemental hires? Students?
 
*What tasks would need to be done?
 
*Can these tasks be scripted/automated? Community input?
 
 
 
SVM
 
*Almost all scripts working
 
*Output of SVM slightly different from Ruihua's results
 
*Need to resolve the differences
 
*Data inconsistencies
 
*Need to spot check the papers to see what the major problems might be
 
*Yuling ~90% confident in his results
 
*Get feedback from relevant/interested curator(s)
 
*One-day CPU/computation time (~30 hours for 70 papers with 9 models)
 
 
 
NAR Publication Accepted
 
 
 
Genetic Interaction Curation
 
*Sharing curated genetic interactions with BioGRID
 
*What tools will each database use and how will we share data?
 
**Both BioGRID and WormBase will use the IMS tool at BioGRID for physical interactions
 
**WormBase will use the Ontology Annotator for genetic interactions
 
**BioGRID may still use IMS for genetic interactions, but the format will have to be parsed to populate Postgres
 
*BioGRID-curated genetic interactions can be flagged as such for later review
 
 
 
Physical Interaction model
 
*Kimberly and Paul Davis discussing
 
*XREF questions
 
**Paul would like to minimize the number of XREFs in the model
 
**Which XREFs can we remove?
 
 
Elsevier Linking
 
*Establishing pipeline for linking Science Direct papers to Wormbase and viceversa
 
 
 
 
== October 20, 2011 ==
 
 
Data Submission Working Group
 
*One representative from each site
 
*Raymond from Caltech, with Juancarlos
 
*Have a discussion for every type
 
**How are we doing?
 
**Are we up to date?
 
**Do we want unpublished data?
 
**Quality control
 
**Data type priority?
 
*Groups/people that have a lot of data of a particular type; acquire their data
 
**How can we facilitate data submission from these groups/individuals?
 
**Ultimately like to train these users/submitters to use curator tools
 
*Form-filling as part of publication process?
 
*What data types are we missing? Wiki Pages
 
**qPCR?
 
**Nanostring?
 
**Single molecule studies, absolute quantities?
 
**SAGE?
 
**3C (Chromosome Conformation Capture), 4C, 5C
 
**Metabalome?
 
**Pathways/Processes?
 
**Drug/disease interactions?
 
**Infections?
 
**Examples of C. elegans as a model?
 
*Expression data
 
**How do annotate a presence/absence call on expression of a given gene?
 
**rpkm cutoff?
 
**Number of molecules?
 
**Case-by-case thresholds
 
*Disease vs Process pages?
 
**100 diseases modeled in ''C. elegans''?
 
**Flat-file capture and display of disease-relevance information (as opposed to ontology-based)
 
*"Subject" vs. "Process" vs. "Disease"
 
**Examples of "Subject": Sewage treatment, Biofilm formation, etc.
 
**Examples of "Disease": Alzheimer's, Huntington's, Cancer, etc.
 
**Examples of "Process": Development, Innate immunity, Organ formation, etc.
 
**User experience?
 
*User-submitted data inconsistency/discrepancy flag?
 
 
 
WORM Paper
 
*Still requesting any major comments or content changes
 
*"Final" version available next week
 
 
 
''C. elegans'' Research Resources Chapter (WormBook)
 
*Yahoo-like Yellow pages of useful URLs with brief descriptions of each site
 
*URLs being pulled from papers by Yuling
 
 
 
 
== October 27, 2011 ==
 
 
 
NAR Proofs are in
 
*Karen will send around
 
*Going back out tomorrow; everyone check
 
*Add Arun as author
 
 
 
Predicted Gene Interactions
 
*Wei wei contacted Paul
 
*Who is the contact person? Xiaodong can be point person
 
*Add URLs to Gene Orienteer and Interolog database on gene pages
 
*Issue is with syncing the databases and updating
 
 
 
SVM
 
*Yuling finsished running SVM pipeline; can make run every two weeks
 
*Need to learn how to best train the models; do we need to retrain?
 
*Gary Williams sent e-mail; incorrect paper IDs on some SVM results
 
**Gene structure correction papers have been incorrect
 
**May need to retrain the gene structure correction SVM
 
**Need someone at Caltech to work with Yuling to retrain the model
 
*Kimberly will look through e-mail archive about the issue
 
*We could really improve the training set, as original set may have been inadequate
 
*What data types do we retrain: Expression pattern? Phenotype?
 
*Improve training by including verified positive papers from last two years (since last training)
 
*In past, needed 400 papers to train; would be best to lower requirement
 
*Antibody and transgene SVM works well; maybe incorporate for Expression pattern SVM?
 
*Use pattern matching to correct SVM results or vice versa?
 
*Good to make progress on Gene Structure correction papers
 
*Keeping track of sentences/paragraphs; performing SVM on paragraphs (vs papers)?
 
 
 
Collecting URLs from papers (Raymond)
 
*For interesting/relevant URLs for C. elegans
 
*Journal URLs are problematic
 
*Yuling runs scripts on Textpresso corpus to collect URLs
 
*PubMed XML of journals (Kimberly)
 
 
 
WORM Publication
 
*Final version available
 
*Review for corrections
 
*To submit early next week (before Wednesday)
 
  
  

Revision as of 19:13, 3 November 2011

2009 Meetings


2011 Meetings

February

March

April

May

June

July

August

September

October



Novermber 3, 2011

Gene ID mapping problems (as discussed in site-wide call)

  • Does CIT need to worry about consistency of gene IDs over database builds?
  • Hyper-linking entities in papers could be an issue
  • Acquire an ID for every gene for every new genome?
  • When mark a paper, separate into two different types of genes :stable vs unstable
    • Stable genes map to stable ID
    • Unstable genes map with a version number
  • WormBase communicate/work with DCC personnel for data wrangling during post-DCC era?


Updating Predicted Gene Interactions

  • Wei wei asking about how to update with WormBase
  • WormBase predicted interaction objects can be updated simply via link to Gene Orienteer
  • If we want a live interaction browser, we would need to pull the data into the build process
  • Do we want a single object per pair of genes? One object per instance of evidence (per paper)?