WBConfCall 2014.03.06-Agenda and Minutes
Todd will likely be unable to attend the first portion of the conference call. I will dial in as soon as I can.
Karen has to report for jury duty and will miss the conference call.
LaDeana Hillier (Waterston Lab) produces much of the C.elegans modENCODE RNASeq data and analyses. She now has a large amount of data that they wish to make available to the public. Normallly, this data would be sent to the modENCODE DCC at UCSC for incorporation into the modENCODE website.
As the modENCODE DCC is overwhelmed with other work, she has proposed that WormBase make the data available to the public. She will still be submitting the data to the DCC.
The raw reads are submitted to the SRA, so she is proposing sending us only the results of processing the reads. These include 203 sample, each of which comprises wig coverage, SL, intron and polyA and transcripts GFF files, together with expression coverage, SL, intron and polyA read files. The total amount of data is less than 1 Tb.
Providing this data to the public could be as simple as just providing a directory where we let all of this sit, or something more complicated.
Our position with providing primary data to users has always been that WormBase isn't a primary repository of data. We prefer to extract all our data from primary resources that can be referred to by other groups. We have been extracting RNAseq data from the SRA and then doing analyses of this data for all our species.
We would certainly like to take a look at this data, but we would not be making these files directly available to our users. We would instead be incorporating aspects of it like the SL sites into our genomic features and making an updated aggregate modENCODE CDS track.
Is this a reasonable position to take with this proposal?
[RL: i will be on a flight when the meeting takes place, so i'll just comment here] We have made primary data available before. I believe our policy has been rather open so far as limited to an FTP folder <ftp://ftp.wormbase.org/pub/wormbase/datasets-published/>.
Searching WB with Gene/Protein Names from Other Species
The issue of searching WB with a gene and/or protein name from another species has come up again.
We include some of these names in the Concise Descriptions when discussing sequence similarity, but this hasn't been done systematically, nor comprehensively (i.e., names from humans and all major MODs, all synonyms, etc.).
I know we've discussed this issue before, but from a user perspective, what do we think the intended search behavior(s) should be, and do we have all of the name/synonym information available to allow users to perform these types of searches effectively?
Do we need use cases to help study/address this issue?
No need for discussion unless the curators want to bring anything up, this is now being handled outside of the site wide call.
Retirement of Designating Laboratory
As the worm community continues to mature we will increasingly need to transfer the ownership of gene classes from one lab to another.
In order to be able to track this transfer we (genenames) would like to propose the addition of a tag to the ?Gene_class model and a corresponding tag to the ?Laboratory model.
Still under discussion.
?Operon Public_name UNIQUE ?Text
Here are the ongoing model changes, with test data. http://wiki.wormbase.org/index.ph/WormBase_Model:Construct#Test_data
These model changes will require curation modifications for many data types. I anticipate needing more than a few weeks to get everything sorted and tested. Is there a way this model can be approved at the beginning of the WS244 curation so we can use the full two months to coordinate the changes?
- UCSC now DCC for both ENCODE and modENCODE. Slow turnaround for worm submissions
- PWS strongly in favour of WormBase hosting this data. If it becomes a drain on time/resources, we can ask NIH for more money (NIH funds both WormBase and modENCODE)
Gene names from other species
- Hinxton to check that we are pulling out all available symbols/synonyms for genes from other models (human, mouse etc).
- BioGrid, UniProt both good sources of synonyms
- Do be discussed/decided off-line:
- Should a search with a human (say) gene symbol give a top hit to the C.elegans ortholog?
- If yes, should we collate/store other-species synonyms in the C.elegans gene objects themselves?
- If yes, how to populate that info? Curation or automatically (via orthology)? The latter may miss some high-profile problem cases