Difference between revisions of "WormMine"
Line 144: | Line 144: | ||
Looking at the protein class: | Looking at the protein class: | ||
<class name="Protein" extends="BioEntity" is-interface="true"> | <class name="Protein" extends="BioEntity" is-interface="true"> | ||
− | <attribute name="molecularWeight" type="java.lang.Float"/> | + | <attribute name="molecularWeight" type="java.lang.Float"/> |
− | <attribute name="md5checksum" type="java.lang.String"/> | + | <attribute name="md5checksum" type="java.lang.String"/> |
− | <attribute name="length" type="java.lang.Integer"/> | + | <attribute name="length" type="java.lang.Integer"/> |
− | <attribute name="geneName" type="java.lang.String"/> | + | <attribute name="geneName" type="java.lang.String"/> |
− | <attribute name="primaryAccession" type="java.lang.String"/> | + | <attribute name="primaryAccession" type="java.lang.String"/> |
− | <reference name="sequence" referenced-type="Sequence"/> | + | <reference name="sequence" referenced-type="Sequence"/> |
<collection name="CDSs" referenced-type="CDS" reverse-reference="protein"/> | <collection name="CDSs" referenced-type="CDS" reverse-reference="protein"/> | ||
<collection name="genes" referenced-type="Gene" reverse-reference="proteins"/> | <collection name="genes" referenced-type="Gene" reverse-reference="proteins"/> | ||
Line 160: | Line 160: | ||
<code>extends="BioEntity"</code>: Protein is a child of BioEntity therefore it inherits BioEntity's data fields. | <code>extends="BioEntity"</code>: Protein is a child of BioEntity therefore it inherits BioEntity's data fields. | ||
− | + | Protein's parent, BioEntity: <small> | |
<class name="BioEntity" is-interface="true"> | <class name="BioEntity" is-interface="true"> | ||
<attribute name="secondaryIdentifier" type="java.lang.String"/> | <attribute name="secondaryIdentifier" type="java.lang.String"/> | ||
Line 181: | Line 181: | ||
Protein contains copies of all these attributes, references, and collections for itself. If BioEntity inherits any fields itself, those are included as well. | Protein contains copies of all these attributes, references, and collections for itself. If BioEntity inherits any fields itself, those are included as well. | ||
+ | |||
+ | <code><attribute name="primaryAccession" type="java.lang.String"/></code>: This creates an attribute of protein called primaryAccession (read primary accession) which is a string (word(s)). This line enables every protein object to hold a primaryAccession value in addition to any children which may inherit from it. | ||
+ | |||
+ | <code><reference name="sequence" referenced-type="Sequence"/></code>: This creates a reference named "sequence" to another data type, in this case Sequence. Only one sequence object can be referenced this way at a time. reverse-reference attributes may appear here, which matches the reciprocal relationship in the referenced type if one exists. | ||
+ | |||
+ | <code><collection name="CDSs" referenced-type="CDS" reverse-reference="protein"/></code>: CDSs collection. It can hold many references to CDSs, which in return are stored in the CDS "protein" (CDS.protein) field. | ||
==Account management== | ==Account management== |
Revision as of 21:21, 6 May 2013
Contents
Current status
TEMPORARILY OFFLINE |
Search down
Follow progress here:
Data contained in WormMine
This lists all data sources contained in WormMine
Source | Description | Source | Species | Data mapping |
---|---|---|---|---|
GO | Ontology terms and relationships comprising GO | GO project website. | N/A | |
GO Annotations | Relationships between Genes and GO | GO project website. | Caenorhabditis elegans | |
Genomic sequences | Fasta DNA sequences | WormBase FTP | Caenorhabditis elegans | |
Protein sequences | Fasta peptide sequences | WormBase FTP | Caenorhabditis elegans | |
Gene locations | Gene chromosomal coordinates | WormBase FTP GFF3 | Caenorhabditis elegans | |
Transcript locations | Transcript chromosomal coordinates | WormBase FTP GFF3 | Caenorhabditis elegans | |
CDS locations | CDS chromosomal coordinates | WormBase FTP GFF3 | Caenorhabditis elegans | |
Gene metadata | Select fields extracted from Ace | AceDB XML dump | All in AceDB | LINK |
Transcript metadata | Select fields extracted from Ace | AceDB XML dump | All in AceDB | LINK |
CDS metadata | Select fields extracted from Ace | AceDB XML dump | All in AceDB | LINK |
Variation metadata | Select fields extracted from Ace | AceDB XML dump | All in AceDB | LINK |
Protein metadata | Select fields extracted from Ace | AceDB XML dump | All in AceDB | LINK |
Phenotype metadata | Select fields extracted from Ace | AceDB XML dump | All in AceDB | LINK |
Expression Pattern metadata | Select fields extracted from Ace | AceDB XML dump | All in AceDB | LINK |
Anatomy Term metadata | Select fields extracted from Ace | AceDB XML dump | All in AceDB | LINK |
Expression Cluster metadata | Select fields extracted from Ace | AceDB XML dump | All in AceDB | LINK |
Life Stage metadata | Select fields extracted from Ace | AceDB XML dump | All in AceDB | LINK |
What is this data mapping?
A loading program plugin has been created for InterMine which extracts data embedded in XML files directly into an InterMine instance. Mapping files are used to configure this program and detail the AceDB XML dumps to InterMine translation. XPath is used to query the XML, and can be reviewed here.
Understanding our model
The data contained in WormMine follows a central model schema. This model should be understood sufficiently to be able to query the data and create templates.
This schema file contains all of the data types contained in WormMine, relationships between them, and each one's data fields.
How to read it
Looking at the protein class:
<class name="Protein" extends="BioEntity" is-interface="true"> <attribute name="molecularWeight" type="java.lang.Float"/> <attribute name="md5checksum" type="java.lang.String"/> <attribute name="length" type="java.lang.Integer"/> <attribute name="geneName" type="java.lang.String"/> <attribute name="primaryAccession" type="java.lang.String"/> <reference name="sequence" referenced-type="Sequence"/> <collection name="CDSs" referenced-type="CDS" reverse-reference="protein"/> <collection name="genes" referenced-type="Gene" reverse-reference="proteins"/> <collection name="transcripts" referenced-type="Transcript" reverse-reference="protein"/> </class>
Line by line:
<class name="Protein" extends="BioEntity" is-interface="true">
extends="BioEntity"
: Protein is a child of BioEntity therefore it inherits BioEntity's data fields.
Protein's parent, BioEntity:
<class name="BioEntity" is-interface="true"> <attribute name="secondaryIdentifier" type="java.lang.String"/> <attribute name="symbol" type="java.lang.String"/> <attribute name="primaryIdentifier" type="java.lang.String"/> <attribute name="lastUpdated" type="java.util.Date"/> <attribute name="name" type="java.lang.String"/> <reference name="organism" referenced-type="Organism"/> <collection name="synonyms" referenced-type="Synonym" reverse-reference="subject"/> <collection name="publications" referenced-type="Publication" reverse-reference="bioEntities"/> <collection name="ontologyAnnotations" referenced-type="OntologyAnnotation" reverse-reference="subject"/> <collection name="phenotypesObserved" referenced-type="Phenotype" reverse-reference="observedIn"/> <collection name="phenotypesNotObserved" referenced-type="Phenotype" reverse-reference="notObservedIn"/> <collection name="crossReferences" referenced-type="CrossReference" reverse-reference="subject"/> <collection name="dataSets" referenced-type="DataSet" reverse-reference="bioEntities"/> <collection name="locatedFeatures" referenced-type="Location" reverse-reference="locatedOn"/> <collection name="locations" referenced-type="Location" reverse-reference="feature"/> </class>
Protein contains copies of all these attributes, references, and collections for itself. If BioEntity inherits any fields itself, those are included as well.
<attribute name="primaryAccession" type="java.lang.String"/>
: This creates an attribute of protein called primaryAccession (read primary accession) which is a string (word(s)). This line enables every protein object to hold a primaryAccession value in addition to any children which may inherit from it.
<reference name="sequence" referenced-type="Sequence"/>
: This creates a reference named "sequence" to another data type, in this case Sequence. Only one sequence object can be referenced this way at a time. reverse-reference attributes may appear here, which matches the reciprocal relationship in the referenced type if one exists.
<collection name="CDSs" referenced-type="CDS" reverse-reference="protein"/>
: CDSs collection. It can hold many references to CDSs, which in return are stored in the CDS "protein" (CDS.protein) field.