ModENCODE Analysis & metadata discussion

Revision as of 11:42, 8 September 2009 by Pdavis (talk | contribs)
Please edit/add to this page regarding the storage of meta data and the nomenclature we should adopt for ?Analysis/?Condition objects in the AceDB database.

?Analysis Naming

-- this is my suggestion pad --

option 1) modEncode_<ID>_<PI>_<type/Desc>


ID = modencode experiment ID (Column in download table)

PI = PI surname responsible for projects

Type/Desc = The data/tissue/(something brief to define the data) type e.g. RACE, 454_seq, Chip_Chip L2_RNAseq etc. etc.

or option 2) just use modENCODE_<column 1>_<column 2> to simplify design?

515  	 CEUP1   	 vetted and released  	Caenorhabditis elegans Piano

would give an analysis object named:

option 1) modENCODE_515_Piano_RACE
option 2) modENCODE_515_CEUP1

It would be good to decide on a nomenclature as there are lots of modENCODE projects that we are going to extract data from, and the ?Analysis class might get a bit confusing.

We could then group all the experiments together under some parent ?Analysis as there are some more complicated examples out there.


Waterston data Gary has been looking at.
438 	mid-L4_20dC_36hrs_post-L1 RNAseq.2 	unvetted 	Caenorhabditis elegans Waterston 
433 	Young_Adult_25dC_46hrs_post-L1 RNAs eq 	unvetted 	Caenorhabditis elegans Waterston
378 	mid-L3_20dC_25hrs_post-L1 RNAseq 	unvetted 	Caenorhabditis elegans Waterston
333 	mid-L2_20dC_14hrs_post-L1 RNASeq 	unvetted 	Caenorhabditis elegans Waterston


Grouped under modENCODE_Waterston_RNAseq

This would require a model change to allow Parent/Child_analysis connections.

Would be good to add Database connections to ?Condition or ?Analysis so that accessions can be added to the objects.