Difference between revisions of "ModENCODE Analysis & metadata discussion"
(7 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
+ | [[modENCODE]] | ||
+ | |||
+ | |||
Please edit/add to this page regarding the storage of meta data and the nomenclature we should adopt for ?Analysis/?Condition objects in the AceDB database. | Please edit/add to this page regarding the storage of meta data and the nomenclature we should adopt for ?Analysis/?Condition objects in the AceDB database. | ||
Line 8: | Line 11: | ||
-- this is my suggestion pad -- | -- this is my suggestion pad -- | ||
− | modEncode_<ID>_<PI>_<type/Desc> | + | option 1) modEncode_<ID>_<PI>_<type/Desc> |
Where | Where | ||
Line 17: | Line 20: | ||
Type/Desc = The data/tissue/(something brief to define the data) type e.g. RACE, 454_seq, Chip_Chip L2_RNAseq etc. etc. | Type/Desc = The data/tissue/(something brief to define the data) type e.g. RACE, 454_seq, Chip_Chip L2_RNAseq etc. etc. | ||
+ | |||
+ | or option 2) just use modENCODE_[http://submit.modencode.org/submit/public/list?|<column 1>]_[http://submit.modencode.org/submit/public/list?|<column 2>] to simplify design? | ||
Example | Example | ||
Line 24: | Line 29: | ||
would give an analysis object named: | would give an analysis object named: | ||
− | modENCODE_515_Piano_RACE | + | option 1) modENCODE_515_Piano_RACE |
− | + | option 2) modENCODE_515_CEUP1 | |
It would be good to decide on a nomenclature as there are lots of modENCODE projects that we are going to extract data from, and the ?Analysis class might get a bit confusing. | It would be good to decide on a nomenclature as there are lots of modENCODE projects that we are going to extract data from, and the ?Analysis class might get a bit confusing. | ||
Line 46: | Line 51: | ||
modENCODE_433_Waterston_Young_Adult_RNAseq | modENCODE_433_Waterston_Young_Adult_RNAseq | ||
− | Grouped under | + | Grouped under modENCODE_Waterston_RNAseq |
− | + | The Analysis model allows Project/Subproject XREFs to allow Parent/Child connections | |
+ | |||
+ | Exceptions | ||
+ | ---------- | ||
+ | There are some modENCODE data described in the 'Integrated Analysis' pepr of Sept/Oct 2010 that do not have a DCC ID number. | ||
+ | Examples are the HOT regions where many transcription factors bind to 304 small regions. | ||
+ | This Analysis object was given the name 'modENCODE_HOT' | ||
+ | |||
+ | |||
+ | Would be good to add Database connections to ?Condition or ?Analysis so that accessions can be added to the objects. | ||
+ | |||
+ | |||
+ | [[Category:Curation]] | ||
+ | [[Category:Developer documentation]] |
Latest revision as of 08:50, 24 September 2010
Please edit/add to this page regarding the storage of meta data and the nomenclature we should adopt for ?Analysis/?Condition objects in the AceDB database.
?Analysis Naming
-- this is my suggestion pad --
option 1) modEncode_<ID>_<PI>_<type/Desc>
Where
ID = modencode experiment ID (Column in download table)
PI = PI surname responsible for projects
Type/Desc = The data/tissue/(something brief to define the data) type e.g. RACE, 454_seq, Chip_Chip L2_RNAseq etc. etc.
or option 2) just use modENCODE_<column 1>_<column 2> to simplify design?
Example ------- 515 CEUP1 vetted and released Caenorhabditis elegans Piano would give an analysis object named: option 1) modENCODE_515_Piano_RACE option 2) modENCODE_515_CEUP1
It would be good to decide on a nomenclature as there are lots of modENCODE projects that we are going to extract data from, and the ?Analysis class might get a bit confusing.
We could then group all the experiments together under some parent ?Analysis as there are some more complicated examples out there.
Example ------- Waterston data Gary has been looking at. ---------------------------------------- 438 mid-L4_20dC_36hrs_post-L1 RNAseq.2 unvetted Caenorhabditis elegans Waterston 433 Young_Adult_25dC_46hrs_post-L1 RNAs eq unvetted Caenorhabditis elegans Waterston 378 mid-L3_20dC_25hrs_post-L1 RNAseq unvetted Caenorhabditis elegans Waterston 333 mid-L2_20dC_14hrs_post-L1 RNASeq unvetted Caenorhabditis elegans Waterston modENCODE_333_Waterston_L2_RNAseq modENCODE_378_Waterston_L3_RNAseq modENCODE_438_Waterston_L4_RNAseq modENCODE_433_Waterston_Young_Adult_RNAseq Grouped under modENCODE_Waterston_RNAseq
The Analysis model allows Project/Subproject XREFs to allow Parent/Child connections
Exceptions ---------- There are some modENCODE data described in the 'Integrated Analysis' pepr of Sept/Oct 2010 that do not have a DCC ID number. Examples are the HOT regions where many transcription factors bind to 304 small regions. This Analysis object was given the name 'modENCODE_HOT'
Would be good to add Database connections to ?Condition or ?Analysis so that accessions can be added to the objects.