Difference between revisions of "ModENCODE Analysis & metadata discussion"

From WormBaseWiki
Jump to navigationJump to search
 
(7 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
[[modENCODE]]
 +
 +
 
Please edit/add to this page regarding the storage of meta data and the nomenclature we should adopt for ?Analysis/?Condition objects in the AceDB database.
 
Please edit/add to this page regarding the storage of meta data and the nomenclature we should adopt for ?Analysis/?Condition objects in the AceDB database.
  
Line 8: Line 11:
 
-- this is my suggestion pad --
 
-- this is my suggestion pad --
  
modEncode_<ID>_<PI>_<type/Desc>
+
option 1) modEncode_<ID>_<PI>_<type/Desc>
  
 
Where  
 
Where  
Line 17: Line 20:
  
 
Type/Desc = The data/tissue/(something brief to define the data) type e.g. RACE, 454_seq, Chip_Chip L2_RNAseq etc. etc.
 
Type/Desc = The data/tissue/(something brief to define the data) type e.g. RACE, 454_seq, Chip_Chip L2_RNAseq etc. etc.
 +
 +
or option 2) just use modENCODE_[http://submit.modencode.org/submit/public/list?|<column 1>]_[http://submit.modencode.org/submit/public/list?|<column 2>] to simplify design?
  
 
  Example
 
  Example
Line 24: Line 29:
 
  would give an analysis object named:
 
  would give an analysis object named:
 
   
 
   
  modENCODE_515_Piano_RACE
+
  option 1) modENCODE_515_Piano_RACE
 
+
option 2) modENCODE_515_CEUP1
  
 
It would be good to decide on a nomenclature as there are lots of modENCODE projects that we are going to extract data from, and the ?Analysis class might get a bit confusing.
 
It would be good to decide on a nomenclature as there are lots of modENCODE projects that we are going to extract data from, and the ?Analysis class might get a bit confusing.
Line 46: Line 51:
 
  modENCODE_433_Waterston_Young_Adult_RNAseq
 
  modENCODE_433_Waterston_Young_Adult_RNAseq
 
   
 
   
  Grouped under modENCODE_Waterston
+
  Grouped under modENCODE_Waterston_RNAseq
 
   
 
   
This would require a model change to allow Parent/Child_analysis connections.
+
The Analysis model allows Project/Subproject XREFs to allow Parent/Child connections
 +
 
 +
Exceptions
 +
----------
 +
There are some modENCODE data described in the 'Integrated Analysis' pepr of Sept/Oct 2010 that do not have a DCC ID number.
 +
Examples are the HOT regions where many transcription factors bind to 304 small regions.
 +
This Analysis object was given the name 'modENCODE_HOT'
 +
 
 +
 
 +
Would be good to add Database connections to ?Condition or ?Analysis so that accessions can be added to the objects.
 +
 
 +
 
 +
[[Category:Curation]]
 +
[[Category:Developer documentation]]

Latest revision as of 08:50, 24 September 2010

modENCODE


Please edit/add to this page regarding the storage of meta data and the nomenclature we should adopt for ?Analysis/?Condition objects in the AceDB database.


?Analysis Naming

-- this is my suggestion pad --

option 1) modEncode_<ID>_<PI>_<type/Desc>

Where

ID = modencode experiment ID (Column in download table)

PI = PI surname responsible for projects

Type/Desc = The data/tissue/(something brief to define the data) type e.g. RACE, 454_seq, Chip_Chip L2_RNAseq etc. etc.

or option 2) just use modENCODE_<column 1>_<column 2> to simplify design?

Example
-------
515  	 CEUP1   	 vetted and released  	Caenorhabditis elegans Piano

would give an analysis object named:

option 1) modENCODE_515_Piano_RACE
option 2) modENCODE_515_CEUP1

It would be good to decide on a nomenclature as there are lots of modENCODE projects that we are going to extract data from, and the ?Analysis class might get a bit confusing.

We could then group all the experiments together under some parent ?Analysis as there are some more complicated examples out there.

Example
-------

Waterston data Gary has been looking at.
----------------------------------------
438 	mid-L4_20dC_36hrs_post-L1 RNAseq.2 	unvetted 	Caenorhabditis elegans Waterston 
433 	Young_Adult_25dC_46hrs_post-L1 RNAs eq 	unvetted 	Caenorhabditis elegans Waterston
378 	mid-L3_20dC_25hrs_post-L1 RNAseq 	unvetted 	Caenorhabditis elegans Waterston
333 	mid-L2_20dC_14hrs_post-L1 RNASeq 	unvetted 	Caenorhabditis elegans Waterston

modENCODE_333_Waterston_L2_RNAseq
modENCODE_378_Waterston_L3_RNAseq
modENCODE_438_Waterston_L4_RNAseq
modENCODE_433_Waterston_Young_Adult_RNAseq

Grouped under modENCODE_Waterston_RNAseq

The Analysis model allows Project/Subproject XREFs to allow Parent/Child connections

Exceptions
----------
There are some modENCODE data described in the 'Integrated Analysis' pepr of Sept/Oct 2010 that do not have a DCC ID number. 
Examples are the HOT regions where many transcription factors bind to 304 small regions.
This Analysis object was given the name 'modENCODE_HOT'


Would be good to add Database connections to ?Condition or ?Analysis so that accessions can be added to the objects.