WBConfCall 2014.03.06-Agenda and Minutes
LaDeana Hillier (Waterston Lab) produces much of the C.elegans modENCODE RNASeq data and analyses. She now has a large amount of data that they wish to make available to the public. Normallly, this data would be sent to the modENCODE DCC at UCSC for incorporation into the modENCODE website.
As the modENCODE DCC is overwhelmed with other work, she has proposed that WormBase make the data available to the public. She will still be submitting the data to the DCC.
The raw reads are submitted to the SRA, so she is proposing sending us only the results of processing the reads. These include 203 sample, each of which comprises wig coverage, SL, intron and polyA and transcripts GFF files, together with expression coverage, SL, intron and polyA read files. The total amount of data is less than 1 Tb.
Providing this data to the public could be as simple as just providing a directory where we let all of this sit, or something more complicated.
Our position with providing primary data to users has always been that WormBase isn't a primary repository of data. We prefer to extract all our data from primary resources that can be referred to by other groups. We have been extracting RNAseq data from the SRA and then doing analyses of this data for all our species.
We would certainly like to take a look at this data, but we would not be making these files directly available to our users. We would instead be incorporating aspects of it like the SL sites into our genomic features and making an updated aggregate modENCODE CDS track.
Is this a reasonable position to take with this proposal?
[RL: i will be on a flight when the meeting takes place, so i'll just comment here] We have made primary data available before. I believe our policy has been rather open so far as limited to an FTP folder <ftp://ftp.wormbase.org/pub/wormbase/datasets-published/>.
Searching WB with Gene/Protein Names from Other Species
The issue of searching WB with a gene and/or protein name from another species has come up again.
We include some of these names in the Concise Descriptions when discussing sequence similarity, but this hasn't been done systematically, nor comprehensively (i.e., names from humans and all major MODs, all synonyms, etc.).
I know we've discussed this issue before, but from a user perspective, what do we think the intended search behavior(s) should be, and do we have all of the name/synonym information available to allow users to perform these types of searches effectively?
Do we need use cases to help study/address this issue?
- human WRAP53/TCAB1
- S. cerevisiae Mad1 or Mad1p
Balancers in GBrowse
These model changes will require curation modifications for many data types. I anticipate needing more time than a few weeks to get everything sorted and tested. Is there a way this model can be approved at the beginning of the WS244 curation so we can use the full two months to coordinate the changes? --kjy (talk) 01:59, 6 March 2014 (UTC)
Retirement of Designating Laboratory
As the worm community continues to mature we will increasingly need to transfer the ownership of gene classes from one lab to another.
In order to be able to track this transfer we (genenames) would like to propose the addition of a tag to the ?Gene_class model and a corresponding tag to the ?Laboratory model.
Still under discussion.
?Operon Public_name UNIQUE ?Text Species ?Species