ModENCODE Analysis & metadata discussion

From WormBaseWiki
Jump to navigationJump to search


Please edit/add to this page regarding the storage of meta data and the nomenclature we should adopt for ?Analysis/?Condition objects in the AceDB database.

?Analysis Naming

-- this is my suggestion pad --

option 1) modEncode_<ID>_<PI>_<type/Desc>


ID = modencode experiment ID (Column in download table)

PI = PI surname responsible for projects

Type/Desc = The data/tissue/(something brief to define the data) type e.g. RACE, 454_seq, Chip_Chip L2_RNAseq etc. etc.

or option 2) just use modENCODE_<column 1>_<column 2> to simplify design?

515  	 CEUP1   	 vetted and released  	Caenorhabditis elegans Piano

would give an analysis object named:

option 1) modENCODE_515_Piano_RACE
option 2) modENCODE_515_CEUP1

It would be good to decide on a nomenclature as there are lots of modENCODE projects that we are going to extract data from, and the ?Analysis class might get a bit confusing.

We could then group all the experiments together under some parent ?Analysis as there are some more complicated examples out there.


Waterston data Gary has been looking at.
438 	mid-L4_20dC_36hrs_post-L1 RNAseq.2 	unvetted 	Caenorhabditis elegans Waterston 
433 	Young_Adult_25dC_46hrs_post-L1 RNAs eq 	unvetted 	Caenorhabditis elegans Waterston
378 	mid-L3_20dC_25hrs_post-L1 RNAseq 	unvetted 	Caenorhabditis elegans Waterston
333 	mid-L2_20dC_14hrs_post-L1 RNASeq 	unvetted 	Caenorhabditis elegans Waterston


Grouped under modENCODE_Waterston_RNAseq

The Analysis model allows Project/Subproject XREFs to allow Parent/Child connections

There are some modENCODE data described in the 'Integrated Analysis' pepr of Sept/Oct 2010 that do not have a DCC ID number. 
Examples are the HOT regions where many transcription factors bind to 304 small regions.
This Analysis object was given the name 'modENCODE_HOT'

Would be good to add Database connections to ?Condition or ?Analysis so that accessions can be added to the objects.