Difference between revisions of "WormBase-Caltech Weekly Calls"
From WormBaseWiki
Jump to navigationJump to searchm |
|||
Line 190: | Line 190: | ||
*Yahoo-like Yellow pages of useful URLs with brief descriptions of each site | *Yahoo-like Yellow pages of useful URLs with brief descriptions of each site | ||
*URLs being pulled from papers by Yuling | *URLs being pulled from papers by Yuling | ||
+ | |||
+ | |||
+ | |||
+ | == October 27, 2011 == | ||
+ | |||
+ | |||
+ | NAR Proofs are in | ||
+ | *Karen will send around | ||
+ | *Going back out tomorrow; everyone check | ||
+ | *Add Arun as author | ||
+ | |||
+ | |||
+ | Predicted Gene Interactions | ||
+ | *Wei wei contacted Paul | ||
+ | *Who is the contact person? Xiaodong can be point person | ||
+ | *Add URLs to Gene Orienteer and Interolog database on gene pages | ||
+ | *Issue is with syncing the databases and updating | ||
+ | |||
+ | |||
+ | SVM | ||
+ | *Yuling finsished running SVM pipeline; can make run every two weeks | ||
+ | *Need to learn how to best train the models; do we need to retrain? | ||
+ | *Gary Williams sent e-mail; incorrect paper IDs on some SVM results | ||
+ | **Gene structure correction papers have been incorrect | ||
+ | **May need to retrain the gene structure correction SVM | ||
+ | **Need someone at Caltech to work with Yuling to retrain the model | ||
+ | *Kimberly will look through e-mail archive about the issue | ||
+ | *We could really improve the training set, as original set may have been inadequate | ||
+ | *What data types do we retrain: Expression pattern? Phenotype? | ||
+ | *Improve training by including verified positive papers from last two years (since last training) | ||
+ | *In past, needed 400 papers to train; would be best to lower requirement | ||
+ | *Antibody and transgene SVM works well; maybe incorporate for Expression pattern SVM? | ||
+ | *Use pattern matching to correct SVM results or vice versa? | ||
+ | *Good to make progress on Gene Structure correction papers | ||
+ | *Keeping track of sentences/paragraphs; performing SVM on paragraphs (vs papers)? | ||
+ | |||
+ | |||
+ | Collecting URLs from papers (Raymond) | ||
+ | *For interesting/relevant URLs for C. elegans | ||
+ | *Journal URLs are problematic | ||
+ | *Yuling runs scripts on Textpresso corpus to collect URLs | ||
+ | *PubMed XML of journals (Kimberly) | ||
+ | |||
+ | |||
+ | WORM Publication | ||
+ | *Final version available | ||
+ | *Review for corrections | ||
+ | *To submit early next week (before Wednesday) |
Revision as of 18:51, 27 October 2011
2011 Meetings
October 6, 2011
SVM
- SVM Manuscript in process of resubmission
- Messy procedures in place; needs to be cleaned up
- Yuling able to get pipeline running from beginning to the end
- May take longer than previously expected to get up and running
- FlyBase has code running for SVM
- Yuling should contact FlyBase to make sure they're OK, once he has WormBase SVM working
- SGD would like to incorporate SVM into they're database
- Yuri would work on SGD SVM, ultimately
Physical interaction model
- Close to done
- Once model approved, work on:
- Converting BioGRID file to the ACE model
- Convert existing YH data into new model
Image curation
- Need to establish legal process for acquiring images
- Contact Caltech Office of the General Counsel for approval
WORM Paper
- Karen working on it
- Need a clearer/new plan
Expression pattern display
- Showing composite images with the option to see greater detail
- Separate out detail images according to Qualifier tags (Certain, Uncertain, Partial)
- Render oblique angles of cells/tissues to show 3D features/morphology of anatomy objects and, possibly, animations
- "Real" images should always take precedence to "virtual" images
- May talk to John Murray or Bill Mohler for confocal images of embryos
Worm Method (Worm Book)
- Any new web pages for C. elegans researchers, send to Raymond
- URL-finder for neuroscience (Yuling)
Textpresso
- Working on Fly Textpresso site (and others)
- Rose Oughtred would like Wnt Textpresso tools
- PDF tools (URL extractor)
- SVM re-working
- Arun working on GSA editor
Grants
- Need to think about next grant in a couple of months (winter); due in October 2012? 30 pages, short?
- Focus on writing papers (anatomy function, WORM, virtual worm, etc.)
- Focus on human-relevance
- Need Gantt charts and quarterly statements of progress
- Stats
- Where are we with each data type
- Status of website
- Automation
- What is different in the field?
- How is data changing?
October 13, 2011
WORM Publiction
- Looks good overall
- Change "library" of papers to "index"
- Putative deadline of November 15th
- Put in more details about the new website?
- Figures? Screenshots from new website? Gene page features?
- Save other (detailed) content for future papers
Paper pipeline
- Need to fix some PubMed IDs
- Over 900 articles in Postgres do not have a PubMed ID; Why? How to fix?
- Could some aspects of the pipeline be accomplished by a non-PhD?
- Supplemental hires? Students?
- What tasks would need to be done?
- Can these tasks be scripted/automated? Community input?
SVM
- Almost all scripts working
- Output of SVM slightly different from Ruihua's results
- Need to resolve the differences
- Data inconsistencies
- Need to spot check the papers to see what the major problems might be
- Yuling ~90% confident in his results
- Get feedback from relevant/interested curator(s)
- One-day CPU/computation time (~30 hours for 70 papers with 9 models)
NAR Publication Accepted
Genetic Interaction Curation
- Sharing curated genetic interactions with BioGRID
- What tools will each database use and how will we share data?
- Both BioGRID and WormBase will use the IMS tool at BioGRID for physical interactions
- WormBase will use the Ontology Annotator for genetic interactions
- BioGRID may still use IMS for genetic interactions, but the format will have to be parsed to populate Postgres
- BioGRID-curated genetic interactions can be flagged as such for later review
Physical Interaction model
- Kimberly and Paul Davis discussing
- XREF questions
- Paul would like to minimize the number of XREFs in the model
- Which XREFs can we remove?
Elsevier Linking
- Establishing pipeline for linking Science Direct papers to Wormbase and viceversa
October 20, 2011
Data Submission Working Group
- One representative from each site
- Raymond from Caltech, with Juancarlos
- Have a discussion for every type
- How are we doing?
- Are we up to date?
- Do we want unpublished data?
- Quality control
- Data type priority?
- Groups/people that have a lot of data of a particular type; acquire their data
- How can we facilitate data submission from these groups/individuals?
- Ultimately like to train these users/submitters to use curator tools
- Form-filling as part of publication process?
- What data types are we missing? Wiki Pages
- qPCR?
- Nanostring?
- Single molecule studies, absolute quantities?
- SAGE?
- 3C (Chromosome Conformation Capture), 4C, 5C
- Metabalome?
- Pathways/Processes?
- Drug/disease interactions?
- Infections?
- Examples of C. elegans as a model?
- Expression data
- How do annotate a presence/absence call on expression of a given gene?
- rpkm cutoff?
- Number of molecules?
- Case-by-case thresholds
- Disease vs Process pages?
- 100 diseases modeled in C. elegans?
- Flat-file capture and display of disease-relevance information (as opposed to ontology-based)
- "Subject" vs. "Process" vs. "Disease"
- Examples of "Subject": Sewage treatment, Biofilm formation, etc.
- Examples of "Disease": Alzheimer's, Huntington's, Cancer, etc.
- Examples of "Process": Development, Innate immunity, Organ formation, etc.
- User experience?
- User-submitted data inconsistency/discrepancy flag?
WORM Paper
- Still requesting any major comments or content changes
- "Final" version available next week
C. elegans Research Resources Chapter (WormBook)
- Yahoo-like Yellow pages of useful URLs with brief descriptions of each site
- URLs being pulled from papers by Yuling
October 27, 2011
NAR Proofs are in
- Karen will send around
- Going back out tomorrow; everyone check
- Add Arun as author
Predicted Gene Interactions
- Wei wei contacted Paul
- Who is the contact person? Xiaodong can be point person
- Add URLs to Gene Orienteer and Interolog database on gene pages
- Issue is with syncing the databases and updating
SVM
- Yuling finsished running SVM pipeline; can make run every two weeks
- Need to learn how to best train the models; do we need to retrain?
- Gary Williams sent e-mail; incorrect paper IDs on some SVM results
- Gene structure correction papers have been incorrect
- May need to retrain the gene structure correction SVM
- Need someone at Caltech to work with Yuling to retrain the model
- Kimberly will look through e-mail archive about the issue
- We could really improve the training set, as original set may have been inadequate
- What data types do we retrain: Expression pattern? Phenotype?
- Improve training by including verified positive papers from last two years (since last training)
- In past, needed 400 papers to train; would be best to lower requirement
- Antibody and transgene SVM works well; maybe incorporate for Expression pattern SVM?
- Use pattern matching to correct SVM results or vice versa?
- Good to make progress on Gene Structure correction papers
- Keeping track of sentences/paragraphs; performing SVM on paragraphs (vs papers)?
Collecting URLs from papers (Raymond)
- For interesting/relevant URLs for C. elegans
- Journal URLs are problematic
- Yuling runs scripts on Textpresso corpus to collect URLs
- PubMed XML of journals (Kimberly)
WORM Publication
- Final version available
- Review for corrections
- To submit early next week (before Wednesday)