|
|
Line 20: |
Line 20: |
| [[WormBase-Caltech_Weekly_Calls_September_2011|September]] | | [[WormBase-Caltech_Weekly_Calls_September_2011|September]] |
| | | |
| + | [[WormBase-Caltech_Weekly_Calls_October_2011|October]] |
| | | |
− |
| |
− | == October 6, 2011 ==
| |
− |
| |
− | SVM
| |
− | *SVM Manuscript in process of resubmission
| |
− | *Messy procedures in place; needs to be cleaned up
| |
− | *Yuling able to get pipeline running from beginning to the end
| |
− | *May take longer than previously expected to get up and running
| |
− | *FlyBase has code running for SVM
| |
− | *Yuling should contact FlyBase to make sure they're OK, once he has WormBase SVM working
| |
− | *SGD would like to incorporate SVM into they're database
| |
− | *Yuri would work on SGD SVM, ultimately
| |
− |
| |
− |
| |
− | Physical interaction model
| |
− | *Close to done
| |
− | *Once model approved, work on:
| |
− | **Converting BioGRID file to the ACE model
| |
− | **Convert existing YH data into new model
| |
− |
| |
− |
| |
− | Image curation
| |
− | *Need to establish legal process for acquiring images
| |
− | *Contact Caltech Office of the General Counsel for approval
| |
− |
| |
− |
| |
− | WORM Paper
| |
− | *Karen working on it
| |
− | *Need a clearer/new plan
| |
− |
| |
− |
| |
− | Expression pattern display
| |
− | * Showing composite images with the option to see greater detail
| |
− | * Separate out detail images according to Qualifier tags (Certain, Uncertain, Partial)
| |
− | * Render oblique angles of cells/tissues to show 3D features/morphology of anatomy objects and, possibly, animations
| |
− | * "Real" images should always take precedence to "virtual" images
| |
− | * May talk to John Murray or Bill Mohler for confocal images of embryos
| |
− |
| |
− |
| |
− | Worm Method (Worm Book)
| |
− | *Any new web pages for C. elegans researchers, send to Raymond
| |
− | *URL-finder for neuroscience (Yuling)
| |
− |
| |
− |
| |
− | Textpresso
| |
− | *Working on Fly Textpresso site (and others)
| |
− | *Rose Oughtred would like Wnt Textpresso tools
| |
− | *PDF tools (URL extractor)
| |
− | *SVM re-working
| |
− | *Arun working on GSA editor
| |
− |
| |
− |
| |
− | Grants
| |
− | * Need to think about next grant in a couple of months (winter); due in October 2012? 30 pages, short?
| |
− | * Focus on writing papers (anatomy function, WORM, virtual worm, etc.)
| |
− | * Focus on human-relevance
| |
− | * Need Gantt charts and quarterly statements of progress
| |
− | * Stats
| |
− | ** Where are we with each data type
| |
− | ** Status of website
| |
− | ** Automation
| |
− | ** What is different in the field?
| |
− | ** How is data changing?
| |
− |
| |
− |
| |
− | == October 13, 2011 ==
| |
− |
| |
− | WORM Publiction
| |
− | *Looks good overall
| |
− | *Change "library" of papers to "index"
| |
− | *Putative deadline of November 15th
| |
− | *Put in more details about the new website?
| |
− | *Figures? Screenshots from new website? Gene page features?
| |
− | *Save other (detailed) content for future papers
| |
− |
| |
− |
| |
− | Paper pipeline
| |
− | *Need to fix some PubMed IDs
| |
− | *Over 900 articles in Postgres do not have a PubMed ID; Why? How to fix?
| |
− | *Could some aspects of the pipeline be accomplished by a non-PhD?
| |
− | *Supplemental hires? Students?
| |
− | *What tasks would need to be done?
| |
− | *Can these tasks be scripted/automated? Community input?
| |
− |
| |
− |
| |
− | SVM
| |
− | *Almost all scripts working
| |
− | *Output of SVM slightly different from Ruihua's results
| |
− | *Need to resolve the differences
| |
− | *Data inconsistencies
| |
− | *Need to spot check the papers to see what the major problems might be
| |
− | *Yuling ~90% confident in his results
| |
− | *Get feedback from relevant/interested curator(s)
| |
− | *One-day CPU/computation time (~30 hours for 70 papers with 9 models)
| |
− |
| |
− |
| |
− | NAR Publication Accepted
| |
− |
| |
− |
| |
− | Genetic Interaction Curation
| |
− | *Sharing curated genetic interactions with BioGRID
| |
− | *What tools will each database use and how will we share data?
| |
− | **Both BioGRID and WormBase will use the IMS tool at BioGRID for physical interactions
| |
− | **WormBase will use the Ontology Annotator for genetic interactions
| |
− | **BioGRID may still use IMS for genetic interactions, but the format will have to be parsed to populate Postgres
| |
− | *BioGRID-curated genetic interactions can be flagged as such for later review
| |
− |
| |
− |
| |
− | Physical Interaction model
| |
− | *Kimberly and Paul Davis discussing
| |
− | *XREF questions
| |
− | **Paul would like to minimize the number of XREFs in the model
| |
− | **Which XREFs can we remove?
| |
− |
| |
− | Elsevier Linking
| |
− | *Establishing pipeline for linking Science Direct papers to Wormbase and viceversa
| |
− |
| |
− |
| |
− |
| |
− | == October 20, 2011 ==
| |
− |
| |
− | Data Submission Working Group
| |
− | *One representative from each site
| |
− | *Raymond from Caltech, with Juancarlos
| |
− | *Have a discussion for every type
| |
− | **How are we doing?
| |
− | **Are we up to date?
| |
− | **Do we want unpublished data?
| |
− | **Quality control
| |
− | **Data type priority?
| |
− | *Groups/people that have a lot of data of a particular type; acquire their data
| |
− | **How can we facilitate data submission from these groups/individuals?
| |
− | **Ultimately like to train these users/submitters to use curator tools
| |
− | *Form-filling as part of publication process?
| |
− | *What data types are we missing? Wiki Pages
| |
− | **qPCR?
| |
− | **Nanostring?
| |
− | **Single molecule studies, absolute quantities?
| |
− | **SAGE?
| |
− | **3C (Chromosome Conformation Capture), 4C, 5C
| |
− | **Metabalome?
| |
− | **Pathways/Processes?
| |
− | **Drug/disease interactions?
| |
− | **Infections?
| |
− | **Examples of C. elegans as a model?
| |
− | *Expression data
| |
− | **How do annotate a presence/absence call on expression of a given gene?
| |
− | **rpkm cutoff?
| |
− | **Number of molecules?
| |
− | **Case-by-case thresholds
| |
− | *Disease vs Process pages?
| |
− | **100 diseases modeled in ''C. elegans''?
| |
− | **Flat-file capture and display of disease-relevance information (as opposed to ontology-based)
| |
− | *"Subject" vs. "Process" vs. "Disease"
| |
− | **Examples of "Subject": Sewage treatment, Biofilm formation, etc.
| |
− | **Examples of "Disease": Alzheimer's, Huntington's, Cancer, etc.
| |
− | **Examples of "Process": Development, Innate immunity, Organ formation, etc.
| |
− | **User experience?
| |
− | *User-submitted data inconsistency/discrepancy flag?
| |
− |
| |
− |
| |
− | WORM Paper
| |
− | *Still requesting any major comments or content changes
| |
− | *"Final" version available next week
| |
− |
| |
− |
| |
− | ''C. elegans'' Research Resources Chapter (WormBook)
| |
− | *Yahoo-like Yellow pages of useful URLs with brief descriptions of each site
| |
− | *URLs being pulled from papers by Yuling
| |
− |
| |
− |
| |
− |
| |
− | == October 27, 2011 ==
| |
− |
| |
− |
| |
− | NAR Proofs are in
| |
− | *Karen will send around
| |
− | *Going back out tomorrow; everyone check
| |
− | *Add Arun as author
| |
− |
| |
− |
| |
− | Predicted Gene Interactions
| |
− | *Wei wei contacted Paul
| |
− | *Who is the contact person? Xiaodong can be point person
| |
− | *Add URLs to Gene Orienteer and Interolog database on gene pages
| |
− | *Issue is with syncing the databases and updating
| |
− |
| |
− |
| |
− | SVM
| |
− | *Yuling finsished running SVM pipeline; can make run every two weeks
| |
− | *Need to learn how to best train the models; do we need to retrain?
| |
− | *Gary Williams sent e-mail; incorrect paper IDs on some SVM results
| |
− | **Gene structure correction papers have been incorrect
| |
− | **May need to retrain the gene structure correction SVM
| |
− | **Need someone at Caltech to work with Yuling to retrain the model
| |
− | *Kimberly will look through e-mail archive about the issue
| |
− | *We could really improve the training set, as original set may have been inadequate
| |
− | *What data types do we retrain: Expression pattern? Phenotype?
| |
− | *Improve training by including verified positive papers from last two years (since last training)
| |
− | *In past, needed 400 papers to train; would be best to lower requirement
| |
− | *Antibody and transgene SVM works well; maybe incorporate for Expression pattern SVM?
| |
− | *Use pattern matching to correct SVM results or vice versa?
| |
− | *Good to make progress on Gene Structure correction papers
| |
− | *Keeping track of sentences/paragraphs; performing SVM on paragraphs (vs papers)?
| |
− |
| |
− |
| |
− | Collecting URLs from papers (Raymond)
| |
− | *For interesting/relevant URLs for C. elegans
| |
− | *Journal URLs are problematic
| |
− | *Yuling runs scripts on Textpresso corpus to collect URLs
| |
− | *PubMed XML of journals (Kimberly)
| |
− |
| |
− |
| |
− | WORM Publication
| |
− | *Final version available
| |
− | *Review for corrections
| |
− | *To submit early next week (before Wednesday)
| |
| | | |
| | | |