Difference between revisions of "ModENCODE Analysis & metadata discussion"

From WormBaseWiki
Jump to navigationJump to search
 
(4 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
[[modENCODE]]
 +
 +
 
Please edit/add to this page regarding the storage of meta data and the nomenclature we should adopt for ?Analysis/?Condition objects in the AceDB database.
 
Please edit/add to this page regarding the storage of meta data and the nomenclature we should adopt for ?Analysis/?Condition objects in the AceDB database.
  
Line 50: Line 53:
 
  Grouped under modENCODE_Waterston_RNAseq
 
  Grouped under modENCODE_Waterston_RNAseq
 
   
 
   
This would require a model change to allow Parent/Child_analysis connections.
+
The Analysis model allows Project/Subproject XREFs to allow Parent/Child connections
 +
 
 +
Exceptions
 +
----------
 +
There are some modENCODE data described in the 'Integrated Analysis' pepr of Sept/Oct 2010 that do not have a DCC ID number.
 +
Examples are the HOT regions where many transcription factors bind to 304 small regions.
 +
This Analysis object was given the name 'modENCODE_HOT'
  
  
 
Would be good to add Database connections to ?Condition or ?Analysis so that accessions can be added to the objects.
 
Would be good to add Database connections to ?Condition or ?Analysis so that accessions can be added to the objects.
 +
 +
 +
[[Category:Curation]]
 +
[[Category:Developer documentation]]

Latest revision as of 08:50, 24 September 2010

modENCODE


Please edit/add to this page regarding the storage of meta data and the nomenclature we should adopt for ?Analysis/?Condition objects in the AceDB database.


?Analysis Naming

-- this is my suggestion pad --

option 1) modEncode_<ID>_<PI>_<type/Desc>

Where

ID = modencode experiment ID (Column in download table)

PI = PI surname responsible for projects

Type/Desc = The data/tissue/(something brief to define the data) type e.g. RACE, 454_seq, Chip_Chip L2_RNAseq etc. etc.

or option 2) just use modENCODE_<column 1>_<column 2> to simplify design?

Example
-------
515  	 CEUP1   	 vetted and released  	Caenorhabditis elegans Piano

would give an analysis object named:

option 1) modENCODE_515_Piano_RACE
option 2) modENCODE_515_CEUP1

It would be good to decide on a nomenclature as there are lots of modENCODE projects that we are going to extract data from, and the ?Analysis class might get a bit confusing.

We could then group all the experiments together under some parent ?Analysis as there are some more complicated examples out there.

Example
-------

Waterston data Gary has been looking at.
----------------------------------------
438 	mid-L4_20dC_36hrs_post-L1 RNAseq.2 	unvetted 	Caenorhabditis elegans Waterston 
433 	Young_Adult_25dC_46hrs_post-L1 RNAs eq 	unvetted 	Caenorhabditis elegans Waterston
378 	mid-L3_20dC_25hrs_post-L1 RNAseq 	unvetted 	Caenorhabditis elegans Waterston
333 	mid-L2_20dC_14hrs_post-L1 RNASeq 	unvetted 	Caenorhabditis elegans Waterston

modENCODE_333_Waterston_L2_RNAseq
modENCODE_378_Waterston_L3_RNAseq
modENCODE_438_Waterston_L4_RNAseq
modENCODE_433_Waterston_Young_Adult_RNAseq

Grouped under modENCODE_Waterston_RNAseq

The Analysis model allows Project/Subproject XREFs to allow Parent/Child connections

Exceptions
----------
There are some modENCODE data described in the 'Integrated Analysis' pepr of Sept/Oct 2010 that do not have a DCC ID number. 
Examples are the HOT regions where many transcription factors bind to 304 small regions.
This Analysis object was given the name 'modENCODE_HOT'


Would be good to add Database connections to ?Condition or ?Analysis so that accessions can be added to the objects.