Difference between revisions of "WormBase-Caltech Weekly Calls"

Revision as of 19:13, 3 November 2011

2009 Meetings

2011 Meetings

February

Novermber 3, 2011

Gene ID mapping problems (as discussed in site-wide call)

Does CIT need to worry about consistency of gene IDs over database builds?
Hyper-linking entities in papers could be an issue
Acquire an ID for every gene for every new genome?
When mark a paper, separate into two different types of genes :stable vs unstable
- Stable genes map to stable ID
- Unstable genes map with a version number
WormBase communicate/work with DCC personnel for data wrangling during post-DCC era?

Updating Predicted Gene Interactions

Wei wei asking about how to update with WormBase
WormBase predicted interaction objects can be updated simply via link to Gene Orienteer
If we want a live interaction browser, we would need to pull the data into the build process
Do we want a single object per pair of genes? One object per instance of evidence (per paper)?

Difference between revisions of "WormBase-Caltech Weekly Calls"

Revision as of 19:13, 3 November 2011

2011 Meetings

Novermber 3, 2011

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

@@ Line 20: / Line 20: @@
 [[WormBase-Caltech_Weekly_Calls_September_2011|September]]
+[[WormBase-Caltech_Weekly_Calls_October_2011|October]]
-== October 6, 2011 ==
-SVM
-*SVM Manuscript in process of resubmission
-*Messy procedures in place; needs to be cleaned up
-*Yuling able to get pipeline running from beginning to the end
-*May take longer than previously expected to get up and running
-*FlyBase has code running for SVM
-*Yuling should contact FlyBase to make sure they're OK, once he has WormBase SVM working
-*SGD would like to incorporate SVM into they're database
-*Yuri would work on SGD SVM, ultimately
-Physical interaction model
-*Close to done
-*Once model approved, work on:
-**Converting BioGRID file to the ACE model
-**Convert existing YH data into new model
-Image curation
-*Need to establish legal process for acquiring images
-*Contact Caltech Office of the General Counsel for approval
-WORM Paper
-*Karen working on it
-*Need a clearer/new plan
-Expression pattern display
-* Showing composite images with the option to see greater detail
-* Separate out detail images according to Qualifier tags (Certain, Uncertain, Partial)
-* Render oblique angles of cells/tissues to show 3D features/morphology of anatomy objects and, possibly, animations
-* "Real" images should always take precedence to "virtual" images
-* May talk to John Murray or Bill Mohler for confocal images of embryos
-Worm Method (Worm Book)
-*Any new web pages for C. elegans researchers, send to Raymond
-*URL-finder for neuroscience (Yuling)
-Textpresso
-*Working on Fly Textpresso site (and others)
-*Rose Oughtred would like Wnt Textpresso tools
-*PDF tools (URL extractor)
-*SVM re-working
-*Arun working on GSA editor
-Grants
-* Need to think about next grant in a couple of months (winter); due in October 2012? 30 pages, short?
-* Focus on writing papers (anatomy function, WORM, virtual worm, etc.)
-* Focus on human-relevance
-* Need Gantt charts and quarterly statements of progress
-* Stats
-** Where are we with each data type
-** Status of website
-** Automation
-** What is different in the field?
-** How is data changing?
-== October 13, 2011 ==
-WORM Publiction
-*Looks good overall
-*Change "library" of papers to "index"
-*Putative deadline of November 15th
-*Put in more details about the new website?
-*Figures? Screenshots from new website? Gene page features?
-*Save other (detailed) content for future papers
-Paper pipeline
-*Need to fix some PubMed IDs
-*Over 900 articles in Postgres do not have a PubMed ID; Why? How to fix?
-*Could some aspects of the pipeline be accomplished by a non-PhD?
-*Supplemental hires? Students?
-*What tasks would need to be done?
-*Can these tasks be scripted/automated? Community input?
-SVM
-*Almost all scripts working
-*Output of SVM slightly different from Ruihua's results
-*Need to resolve the differences
-*Data inconsistencies
-*Need to spot check the papers to see what the major problems might be
-*Yuling ~90% confident in his results
-*Get feedback from relevant/interested curator(s)
-*One-day CPU/computation time (~30 hours for 70 papers with 9 models)
-NAR Publication Accepted
-Genetic Interaction Curation
-*Sharing curated genetic interactions with BioGRID
-*What tools will each database use and how will we share data?
-**Both BioGRID and WormBase will use the IMS tool at BioGRID for physical interactions
-**WormBase will use the Ontology Annotator for genetic interactions
-**BioGRID may still use IMS for genetic interactions, but the format will have to be parsed to populate Postgres
-*BioGRID-curated genetic interactions can be flagged as such for later review
-Physical Interaction model
-*Kimberly and Paul Davis discussing
-*XREF questions
-**Paul would like to minimize the number of XREFs in the model
-**Which XREFs can we remove?
-Elsevier Linking
-*Establishing pipeline for linking Science Direct papers to Wormbase and viceversa
-== October 20, 2011 ==
-Data Submission Working Group
-*One representative from each site
-*Raymond from Caltech, with Juancarlos
-*Have a discussion for every type
-**How are we doing?
-**Are we up to date?
-**Do we want unpublished data?
-**Quality control
-**Data type priority?
-*Groups/people that have a lot of data of a particular type; acquire their data
-**How can we facilitate data submission from these groups/individuals?
-**Ultimately like to train these users/submitters to use curator tools
-*Form-filling as part of publication process?
-*What data types are we missing? Wiki Pages
-**qPCR?
-**Nanostring?
-**Single molecule studies, absolute quantities?
-**SAGE?
-**3C (Chromosome Conformation Capture), 4C, 5C
-**Metabalome?
-**Pathways/Processes?
-**Drug/disease interactions?
-**Infections?
-**Examples of C. elegans as a model?
-*Expression data
-**How do annotate a presence/absence call on expression of a given gene?
-**rpkm cutoff?
-**Number of molecules?
-**Case-by-case thresholds
-*Disease vs Process pages?
-**100 diseases modeled in ''C. elegans''?
-**Flat-file capture and display of disease-relevance information (as opposed to ontology-based)
-*"Subject" vs. "Process" vs. "Disease"
-**Examples of "Subject": Sewage treatment, Biofilm formation, etc.
-**Examples of "Disease": Alzheimer's, Huntington's, Cancer, etc.
-**Examples of "Process": Development, Innate immunity, Organ formation, etc.
-**User experience?
-*User-submitted data inconsistency/discrepancy flag?
-WORM Paper
-*Still requesting any major comments or content changes
-*"Final" version available next week
-''C. elegans'' Research Resources Chapter (WormBook)
-*Yahoo-like Yellow pages of useful URLs with brief descriptions of each site
-*URLs being pulled from papers by Yuling
-== October 27, 2011 ==
-NAR Proofs are in
-*Karen will send around
-*Going back out tomorrow; everyone check
-*Add Arun as author
-Predicted Gene Interactions
-*Wei wei contacted Paul
-*Who is the contact person? Xiaodong can be point person
-*Add URLs to Gene Orienteer and Interolog database on gene pages
-*Issue is with syncing the databases and updating
-SVM
-*Yuling finsished running SVM pipeline; can make run every two weeks
-*Need to learn how to best train the models; do we need to retrain?
-*Gary Williams sent e-mail; incorrect paper IDs on some SVM results
-**Gene structure correction papers have been incorrect
-**May need to retrain the gene structure correction SVM
-**Need someone at Caltech to work with Yuling to retrain the model
-*Kimberly will look through e-mail archive about the issue
-*We could really improve the training set, as original set may have been inadequate
-*What data types do we retrain: Expression pattern? Phenotype?
-*Improve training by including verified positive papers from last two years (since last training)
-*In past, needed 400 papers to train; would be best to lower requirement
-*Antibody and transgene SVM works well; maybe incorporate for Expression pattern SVM?
-*Use pattern matching to correct SVM results or vice versa?
-*Good to make progress on Gene Structure correction papers
-*Keeping track of sentences/paragraphs; performing SVM on paragraphs (vs papers)?
-Collecting URLs from papers (Raymond)
-*For interesting/relevant URLs for C. elegans
-*Journal URLs are problematic
-*Yuling runs scripts on Textpresso corpus to collect URLs
-*PubMed XML of journals (Kimberly)
-WORM Publication
-*Final version available
-*Review for corrections
-*To submit early next week (before Wednesday)