WBConfCall 2014.03.06-Agenda and Minutes

From WormBaseWiki
Jump to navigationJump to search

Agenda

modENCODE Data

LaDeana Hillier (Waterston Lab) produces much of the C.elegans modENCODE RNASeq data and analyses. She now has a large amount of data that they wish to make available to the public. Normallly, this data would be sent to the modENCODE DCC at UCSC for incorporation into the modENCODE website.

As the modENCODE DCC is overwhelmed with other work, she has proposed that WormBase make the data available to the public. She will still be submitting the data to the DCC.

The raw reads are submitted to the SRA, so she is proposing sending us only the results of processing the reads. These include 203 sample, each of which comprises wig coverage, SL, intron and polyA and transcripts GFF files, together with expression coverage, SL, intron and polyA read files.

Providing this data to the public could be as simple as just providing a directory where we let all of this sit, or something more complicated.

Our position with providing primary data to users has always been that WormBase isn't a primary repository of data. We prefer to extract all our data from primary resources that can be referred to by other groups. We have been extracting RNAseq data from the SRA and then doing analyses of this data for all our species.

We would certainly like to take a look at this data, but we would not be making these files directly available to our users. We would instead be incorporating aspects of it like the SL sites into our genomic features and making an updated aggregate modENCODE CDS track.

Is this a reasonable position to take with this proposal?

Minutes