Difference between revisions of "WormMine"

From WormBaseWiki
Jump to navigationJump to search
(40 intermediate revisions by 9 users not shown)
Line 1: Line 1:
 +
 
= Current status =
 
= Current status =
  
 
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
 
|-
 
|-
|'''DOWN FOR MAINTENANCE'''
+
|'''
 +
ONLINE'''
 
|}
 
|}
  
 
= Links =
 
= Links =
[http://206.108.125.166:8080/wormmine WormMine]
+
[http://www.wormbase.org/tools/wormmine WormMine]
  
 
[https://github.com/WormBase/intermine/commits/jw GitHub commit history]
 
[https://github.com/WormBase/intermine/commits/jw GitHub commit history]
  
[https://github.com/WormBase/intermine/blob/jw/wormmine/dbmodel/build/model/genomic_model.xml WormMine model file]
+
[http://www.wormbase.org/tools/wormmine/service/model WormMine model file]
 +
 
 +
[https://github.com/WormBase/intermine/blob/jw/bio/sources/wormbase-acedb/wormbase-acedb_additions.xml wormbase-acedb model additions]
 +
 
 +
[[UserGuide:WormMine|User Guide]]
 +
 
 +
= Logging in as admin =
 +
These specific URLs must be used since the admin account is a WormMine, not Google, account.
 +
 
 +
Login URL: http://www.wormbase.org/tools/wormmine/login.do
 +
 
 +
Logout URL: http://www.wormbase.org/tools/wormmine/logout.do
 +
 
 +
Find the admin account username and password by executing this search in your wormbase.org account:
  
= Reaching our beta release =
+
<code>subject:[ WormMine ] Important update on status and curator action</code>
=== Issues for curators ===
+
 
Self-assignment
+
This account is needed to publish templates to the front page.
* Publish lists & templates
+
 
** [https://github.com/WormBase/website/issues/1205 Gene]
+
=New data wishlist=
** [https://github.com/WormBase/website/issues/1206 Transcript]
+
{| border="1"
** [https://github.com/WormBase/website/issues/1207 CDS]
+
! Name !! Where is the data?
** [https://github.com/WormBase/website/issues/1208 Variation]
+
|-  
** [https://github.com/WormBase/website/issues/1209 Protein]
+
| Orthology || ACeDB gene class (Ortholog and Ortholog_other tags) and there is also a OICR created flatfile)
** [https://github.com/WormBase/website/issues/1210 Phenotype]
+
|-
** [https://github.com/WormBase/website/issues/1211 Expression Pattern]
+
| Interaction || AceDB interaction class
** [https://github.com/WormBase/website/issues/1212 Anatomy Term]
+
|-
** [https://github.com/WormBase/website/issues/1213 Expression Cluster]
+
| Transgene ||Transgene AceDB class
** [https://github.com/WormBase/website/issues/1214 Life Stage]
+
|-
** [https://github.com/WormBase/website/issues/1215 GO]
+
| variation coordinates || gff3, type SNP or SNP
 +
|-
 +
|Disease data || In AceDB class ?Gene, Import all data under the Disease_info tag
 +
|-
 +
|Motif Titles || In the "Title" tag of ?Motif
 +
|-
 +
|}
  
 
=Data contained in WormMine=
 
=Data contained in WormMine=
 
This lists all data sources contained in WormMine
 
This lists all data sources contained in WormMine
  
 +
Refer to species list:
 
{| border="1"
 
{| border="1"
! Source !! Description !! Source !! Species !! Data mapping
+
! taxonId !! species
 +
|-
 +
| 6253 || Ascaris suum
 +
|-
 +
| 6279 || Brugia malayi
 +
|-
 +
| 6326 || Bursaphelenchus xylophilus
 +
|-
 +
| 96668 (860376 according to NCBI) || Caenorhabditis angaria
 +
|-
 +
| 135651 || Caenorhabditis brenneri
 +
|-
 +
| 6238 || Caenorhabditis briggsae
 +
|-
 +
| 6239 || Caenorhabditis elegans
 +
|-
 +
| 281687 || Caenorhabditis japonica
 +
|-
 +
| 31234 || Caenorhabditis remanei
 +
|-
 +
| 6289 || Haemonchus contortus
 +
|-
 +
| 37862 || Heterorhabditis bacteriophora
 +
|-
 +
| 6305 || Meloidogyne hapla
 +
|-
 +
| 6306 || Meloidogyne incognita
 +
|-
 +
| 54126 || Pristionchus pacificus
 +
|-
 +
| 34506 || Strongyloides ratti
 +
|-
 +
| 6334 || Trichinella spiralis
 +
|}
 +
 
 +
 
 +
 
 +
{| border="1"
 +
! Source !! Description !! Source !! Species !! Filters !! Data mapping
  
 
|-
 
|-
 
|  GO  
 
|  GO  
 
|| Ontology terms and relationships comprising GO
 
|| Ontology terms and relationships comprising GO
|| GO project website.
+
|| GO project website (We should record the CVS revision number of the ontology file we use so we can perform data checks).
 
|| species neutral
 
|| species neutral
  
Line 46: Line 106:
 
|| GO project website.
 
|| GO project website.
 
|| Caenorhabditis elegans
 
|| Caenorhabditis elegans
 +
|| Removes all with first column = "UniProtKB"
  
 
|-
 
|-
Line 51: Line 112:
 
|| Fasta DNA sequences  
 
|| Fasta DNA sequences  
 
|| WormBase FTP
 
|| WormBase FTP
|| Caenorhabditis elegans
+
|| All in species list
  
 
|-
 
|-
Line 57: Line 118:
 
|| Fasta peptide sequences
 
|| Fasta peptide sequences
 
|| WormBase FTP
 
|| WormBase FTP
|| Caenorhabditis elegans
+
|| All in species list
  
 
|-
 
|-
Line 63: Line 124:
 
|| Gene chromosomal coordinates
 
|| Gene chromosomal coordinates
 
|| WormBase FTP GFF3
 
|| WormBase FTP GFF3
|| Caenorhabditis elegans
+
|| All in species list
  
 
|-
 
|-
Line 69: Line 130:
 
|| Transcript chromosomal coordinates
 
|| Transcript chromosomal coordinates
 
|| WormBase FTP GFF3
 
|| WormBase FTP GFF3
|| Caenorhabditis elegans
+
|| All in species list
  
 
|-
 
|-
Line 75: Line 136:
 
|| CDS chromosomal coordinates
 
|| CDS chromosomal coordinates
 
|| WormBase FTP GFF3
 
|| WormBase FTP GFF3
|| Caenorhabditis elegans
+
|| NONE (locations load improperly)
  
 
|-
 
|-
Line 82: Line 143:
 
|| AceDB XML dump
 
|| AceDB XML dump
 
|| All in AceDB
 
|| All in AceDB
|| [https://github.com/WormBase/intermine/blob/jw/datadir/wormbase-acedb/gene/mapping/wormbase-acedb-gene.properties LINK]
+
|| <code>query find Gene Live</code>
 +
|| [https://github.com/WormBase/intermine/blob/unmerged/build_config/wormbase-acedb/Gene_mapping.properties LINK]
  
 
|-
 
|-
Line 89: Line 151:
 
|| AceDB XML dump
 
|| AceDB XML dump
 
|| All in AceDB
 
|| All in AceDB
|| [https://github.com/WormBase/intermine/blob/jw/datadir/wormbase-acedb/transcript/mapping/transcript_mapping.properties LINK]
+
|| <code>query find Transcript (Gene)</code>
 +
|| [https://github.com/WormBase/intermine/blob/unmerged/build_config/wormbase-acedb/Transcript_mapping.properties LINK]
  
 
|-
 
|-
Line 96: Line 159:
 
|| AceDB XML dump
 
|| AceDB XML dump
 
|| All in AceDB
 
|| All in AceDB
|| [https://github.com/WormBase/intermine/blob/jw/datadir/wormbase-acedb/cds/mapping/cds_mapping.properties LINK]
+
|| <code>query find CDS Method="curated"</code>
 +
|| [https://github.com/WormBase/intermine/blob/unmerged/build_config/wormbase-acedb/CDS_mapping.properties LINK]
  
 
|-
 
|-
Line 103: Line 167:
 
|| AceDB XML dump
 
|| AceDB XML dump
 
|| All in AceDB
 
|| All in AceDB
|| [https://github.com/WormBase/intermine/blob/jw/datadir/wormbase-acedb/variation/mapping/variation_mapping.properties LINK]
+
||
 +
|| [https://github.com/WormBase/intermine/blob/unmerged/build_config/wormbase-acedb/Variation_mapping.properties LINK]
  
 
|-
 
|-
Line 110: Line 175:
 
|| AceDB XML dump
 
|| AceDB XML dump
 
|| All in AceDB
 
|| All in AceDB
|| [https://github.com/WormBase/intermine/blob/jw/datadir/wormbase-acedb/protein/mapping/protein_mapping.properties LINK]
+
|| Limited to [https://github.com/WormBase/intermine/blob/jw/testlab/perl/preprocess/wb-acedb/protein/whitelist/species_whitelist.txt these] species,
 +
<code>query find Protein Corresponding_CDS</code>
 +
|| [https://github.com/WormBase/intermine/blob/unmerged/build_config/wormbase-acedb/Protein_mapping.properties LINK]
  
 
|-
 
|-
Line 117: Line 184:
 
|| AceDB XML dump
 
|| AceDB XML dump
 
|| All in AceDB
 
|| All in AceDB
|| [https://github.com/WormBase/intermine/blob/jw/datadir/wormbase-acedb/phenotype/mapping/phenotype_mapping.properties LINK]
+
||
 +
|| [https://github.com/WormBase/intermine/blob/unmerged/build_config/wormbase-acedb/Phenotype_mapping.properties LINK]
  
 
|-
 
|-
Line 124: Line 192:
 
|| AceDB XML dump
 
|| AceDB XML dump
 
|| All in AceDB
 
|| All in AceDB
|| [https://github.com/WormBase/intermine/blob/jw/datadir/wormbase-acedb/expr_pattern/mapping/expr_pattern_mapping.properties LINK]
+
||
 +
|| [https://github.com/WormBase/intermine/blob/unmerged/build_config/wormbase-acedb/Expr_pattern_mapping.properties LINK]
  
 
|-
 
|-
Line 131: Line 200:
 
|| AceDB XML dump
 
|| AceDB XML dump
 
|| All in AceDB
 
|| All in AceDB
|| [https://github.com/WormBase/intermine/blob/jw/datadir/wormbase-acedb/anatomy_term/mapping/anatomy_term_mapping.properties LINK]
+
||
 +
|| [https://github.com/WormBase/intermine/blob/unmerged/build_config/wormbase-acedb/Anatomy_term_mapping.properties LINK]
  
 
|-
 
|-
Line 138: Line 208:
 
|| AceDB XML dump
 
|| AceDB XML dump
 
|| All in AceDB
 
|| All in AceDB
|| [https://github.com/WormBase/intermine/blob/jw/datadir/wormbase-acedb/expr_cluster/mapping/expr_cluster_mapping.properties LINK]
+
||
 +
|| [https://github.com/WormBase/intermine/blob/unmerged/build_config/wormbase-acedb/Expression_cluster_mapping.properties LINK]
  
 
|-
 
|-
Line 145: Line 216:
 
|| AceDB XML dump
 
|| AceDB XML dump
 
|| All in AceDB
 
|| All in AceDB
|| [https://github.com/WormBase/intermine/blob/jw/datadir/wormbase-acedb/life_stage/mapping/life_stage_mapping.properties LINK]
+
||
 +
|| [https://github.com/WormBase/intermine/blob/unmerged/build_config/wormbase-acedb/Life_stage_mapping.properties LINK]
  
 
|-
 
|-
Line 152: Line 224:
 
|| AceDB XML dump
 
|| AceDB XML dump
 
|| All in AceDB
 
|| All in AceDB
|| [https://github.com/WormBase/intermine/blob/jw/datadir/wormbase-acedb/species/mapping/species_mapping.properties LINK]
+
|| [https://github.com/WormBase/website-intermine/blob/master/acedb-dev/acedb/species_WS238.ace Species list]
 +
|| [https://github.com/WormBase/intermine/blob/unmerged/build_config/wormbase-acedb/Species_mapping.properties LINK]
 +
 
 +
|-
 +
|  RNAi metadata
 +
|| Select fields extracted from Ace
 +
|| AceDB XML dump
 +
|| All in AceDB
 +
||
 +
|| [https://github.com/WormBase/intermine/blob/unmerged/build_config/wormbase-acedb/RNAi_mapping.properties LINK]
  
 
|}
 
|}
Line 163: Line 244:
 
The data contained in WormMine follows a central model schema.  This model should be understood sufficiently to be able to query the data and create templates.
 
The data contained in WormMine follows a central model schema.  This model should be understood sufficiently to be able to query the data and create templates.
  
[https://github.com/WormBase/intermine/blob/jw/wormmine/dbmodel/build/model/genomic_model.xml WormMine model file]
+
[http://www.wormbase.org/tools/wormmine/service/model WormMine model file]
  
 
This schema file contains all of the data types contained in WormMine, relationships between them, and each one's data fields.   
 
This schema file contains all of the data types contained in WormMine, relationships between them, and each one's data fields.   
Line 215: Line 296:
  
 
=Using QueryBuilder=
 
=Using QueryBuilder=
 +
 
[[File:Querybuilder link.png]]
 
[[File:Querybuilder link.png]]
  
Line 226: Line 308:
 
==== Desired results? ====
 
==== Desired results? ====
 
We want to display the symbols of each involved object, plus the gene ID.  This calls for:
 
We want to display the symbols of each involved object, plus the gene ID.  This calls for:
# Gene.primaryIdentifier
+
# Gene.WormBase Gene ID
# Gene.symbol
+
# Gene.Gene Name
# Gene (some relationship(s))-> Transcript.symbol
+
# Gene (some relationship(s))-> Transcript.Sequence Name
# Gene (some relationship(s))-> CDS.symbol
+
# Gene (some relationship(s))-> CDS.Sequence Name
# Gene (some relationship(s))-> Protein.symbol
+
# Gene (some relationship(s))-> Protein.Name
  
 
==== Building the query ====
 
==== Building the query ====
[[File:Qb circled.png|frame|click these areas]]
+
[[File:QB_circled_new.png|frame|click these areas]]
The DB Identifier represents the primaryIdentifier, this is the only field name replaced in this way. Click *SHOW* next to any of these field names to add that attribute to the result table as a column.
+
Click *SHOW* next to any of these field names to add that attribute to the resulting table as a column.
* Show the primaryIdentifier (DB Identifier), and symbol
+
* Show the "WormBase Gene ID" and "Gene Name"
 
* Gene contains a collection of transcripts, this relationship represents transcription.  Expand the relationship by clicking on the <code>[+]</code>
 
* Gene contains a collection of transcripts, this relationship represents transcription.  Expand the relationship by clicking on the <code>[+]</code>
 
<br clear=all>
 
<br clear=all>
* Follow these steps for each of the data types in the chain as illustrated.
 
[[File:Tscript expanded2.png|thumb]]
 
Running the query at this point will give you results for ''all'' genes.  If you have a smaller set in mind, like restricting the genes to egl-19 only, a constraint must be set.
 
* Click "constrain" next to the Gene.symbol field (circle #2), type "egl-19" in the text box, make sure the operator is "=", then "add to query".
 
  
Your query overview should resemble this:
+
 
[[File:Qb egl-19.png|frame|left]]
+
* Follow these steps for each of the data types in the chain as illustrated:
[[File:Cds expanded.png|thumb]]
+
 
[[File:Protein expanded.png|thumb]]
+
[[File:Tscript_expanded2_new2.png|frame|left]]
 +
[[File:CDS_expanded_new2.png|frame]]
 +
[[File:Protein_expanded_new2.png|frame|left]]
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
* Running the query at this point will give you results for ''all'' genes.  If you have a smaller set in mind, like restricting the genes to egl-19 only, a constraint must be set.
 +
* Click "constrain" next to the "Gene.Gene Name" field (circle #2), type "egl-19" in the text box, make sure the operator is "=", then "add to query".
 +
 
 +
 
 +
[[File:Constraint_Dialog_Box_for_egl-19.png|frame]]
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
* Your query overview should resemble this:
 +
 
 +
 
 +
[[File:QB_egl-19_new.png|frame|left]]
 +
 
 
<br clear=all>
 
<br clear=all>
  
Line 266: Line 416:
 
  * Admin users can make a personal template '''public''' by
 
  * Admin users can make a personal template '''public''' by
 
   My Mine -> Templates -> Add Tags -> New tag, type in the text box '''"im:public"'''.
 
   My Mine -> Templates -> Add Tags -> New tag, type in the text box '''"im:public"'''.
 +
  Add one tag at a time.
 +
  Tags help to group lists.
 +
  For the list to be viewable by the '''Public''', add a tag '''"im:public"'''.    For the list to be grouped under the "aspect" categories on the Home page, add the two tags: '''"im:frontpage"''' and '''"im:aspect:ACPECTNAME"'''
 +
  The capitalisation of the ACPECTNAME that you use in the above tag is important. The word ASPECTNAME in the above tag should be replaced by one of '''Genomics''', '''Proteins''', '''Expression''', '''Genetic Variations''', '''Phenotypes''', '''Gene Ontology''', '''Strains'''
 +
  Hit "Create". '''(This tag should be available for the next template you create)'''.
 +
  The Name field text should start with your initials to identify you as the author e.g. "XYZ Gene Homology"
 +
  The Title field text is what the users see and should include '-->' which is converted into am arrow icon on the final template list on the Home page.
  
 
=Creating lists=
 
=Creating lists=
Line 275: Line 432:
 
  3. On the Result page, top right corner, select "Create / Add to List" -> "Create New List" -> "All of Columns ...", or
 
  3. On the Result page, top right corner, select "Create / Add to List" -> "Create New List" -> "All of Columns ...", or
 
     "Choose individual items from the table". "Choose individual items from the table" allows further refinement.
 
     "Choose individual items from the table". "Choose individual items from the table" allows further refinement.
  4. Provide for the list: a Name, an informative Description, and Tags. Add one tag at a time. Tags help to group lists.
+
  4. Provide for the list: a Name, an informative Description, and Tags.  
     For the list to be viewable by the '''Public''', add a tag '''"im:public"'''.
+
     The Name should start with your initials to identify you as the author e.g. "XYZ Gene Homology"
    Hit "Create". '''(This tag should be available for the next template you create)'''.
 
 
  5. Descriptions and tags can also be edited after a list is saved.
 
  5. Descriptions and tags can also be edited after a list is saved.
  
= Do not post here, section marked for deletion =
+
=Internal Report Page=
 +
find intermine internal id via a query (must be logged in as a super user)
 +
 
 +
http://206.108.125.166:8080/wormmine/report.do?id=
  
This section is intended for WormMine testers to leave comments about what works and what doesn't
+
=Other Sites=
  
* On the list view page: can only download results in the XML or JSON formats.
+
Tutorial on making templates http://www.yeastgenome.org/help/video-tutorials/yeastmine
* In the tab 'Home' the "Take a tour" tutorial describes FlyMine, not WormMine
 
* The 'CDS' class does not have a collection of 'exon' class data or 'UTR' class data
 
* What is the difference between 'Primary Accession' and 'Primary Identifier' in the Protein class?
 
* When a search is done to find all CDS objects connected to the Protein with a primary identifier 'WP:CE00285', it only finds the CDS 'R05D3.6' and misses the other CDS that makes this protein: 'ZC262.5'. The same is true of other proteins like 'WP:CE13124'.
 
* The Gene class is missing the Sequence name identifier, e.g. 'AC3.3'
 
* In the tab 'Lists' there is a link to '[Click to Show example]' which pastes in a set of example gene names. These are all FlyBase names.
 
* In the tab 'Data Sources' both the Genomics and Protein links have data-sets and bulk download links for MalariaMine, not WormMine. This page is also reached from the tab 'Home' then the link 'Super helpful gene description!! Read more'
 
* In the 'Home' tab, a search for the sequence name 'AC3.3' finds the Protein 'WP:CE05133' which is not identified by 'AC3.3' and so this is wrong. This search misses finding the Gene 'AC3.3'. I would have expected a search for this sequence name to find the Gene, the Transcript and the CDS.
 
* Is there a simple way to filter out the History CDS objects (*:wp*)? Why were they included in the CDS class? Could we have a History CDS class?
 
* Ditto for the Twinscan predictions in the CDS class (*.tw). It doesn't seem very useful to have these mixed in with the rest of the CDS data.
 
* When I display all CDS class with a connection to the Organism class, showing the organism name and a constraint that the organism is 'Caenorhabditis elegans', the CDS identifiers are displayed but the Organism name is left blank.
 

Revision as of 15:41, 22 October 2013

Current status

ONLINE

Links

WormMine

GitHub commit history

WormMine model file

wormbase-acedb model additions

User Guide

Logging in as admin

These specific URLs must be used since the admin account is a WormMine, not Google, account.

Login URL: http://www.wormbase.org/tools/wormmine/login.do

Logout URL: http://www.wormbase.org/tools/wormmine/logout.do

Find the admin account username and password by executing this search in your wormbase.org account:

subject:[ WormMine ] Important update on status and curator action

This account is needed to publish templates to the front page.

New data wishlist

Name Where is the data?
Orthology ACeDB gene class (Ortholog and Ortholog_other tags) and there is also a OICR created flatfile)
Interaction AceDB interaction class
Transgene Transgene AceDB class
variation coordinates gff3, type SNP or SNP
Disease data In AceDB class ?Gene, Import all data under the Disease_info tag
Motif Titles In the "Title" tag of ?Motif

Data contained in WormMine

This lists all data sources contained in WormMine

Refer to species list:

taxonId species
6253 Ascaris suum
6279 Brugia malayi
6326 Bursaphelenchus xylophilus
96668 (860376 according to NCBI) Caenorhabditis angaria
135651 Caenorhabditis brenneri
6238 Caenorhabditis briggsae
6239 Caenorhabditis elegans
281687 Caenorhabditis japonica
31234 Caenorhabditis remanei
6289 Haemonchus contortus
37862 Heterorhabditis bacteriophora
6305 Meloidogyne hapla
6306 Meloidogyne incognita
54126 Pristionchus pacificus
34506 Strongyloides ratti
6334 Trichinella spiralis


Source Description Source Species Filters Data mapping
GO Ontology terms and relationships comprising GO GO project website (We should record the CVS revision number of the ontology file we use so we can perform data checks). species neutral
GO Annotations Relationships between Genes and GO GO project website. Caenorhabditis elegans Removes all with first column = "UniProtKB"
Genomic sequences Fasta DNA sequences WormBase FTP All in species list
Protein sequences Fasta peptide sequences WormBase FTP All in species list
Gene locations Gene chromosomal coordinates WormBase FTP GFF3 All in species list
Transcript locations Transcript chromosomal coordinates WormBase FTP GFF3 All in species list
CDS locations CDS chromosomal coordinates WormBase FTP GFF3 NONE (locations load improperly)
Gene metadata Select fields extracted from Ace AceDB XML dump All in AceDB query find Gene Live LINK
Transcript metadata Select fields extracted from Ace AceDB XML dump All in AceDB query find Transcript (Gene) LINK
CDS metadata Select fields extracted from Ace AceDB XML dump All in AceDB query find CDS Method="curated" LINK
Variation metadata Select fields extracted from Ace AceDB XML dump All in AceDB LINK
Protein metadata Select fields extracted from Ace AceDB XML dump All in AceDB Limited to these species,

query find Protein Corresponding_CDS

LINK
Phenotype metadata Select fields extracted from Ace AceDB XML dump All in AceDB LINK
Expression Pattern metadata Select fields extracted from Ace AceDB XML dump All in AceDB LINK
Anatomy Term metadata Select fields extracted from Ace AceDB XML dump All in AceDB LINK
Expression Cluster metadata Select fields extracted from Ace AceDB XML dump All in AceDB LINK
Life Stage metadata Select fields extracted from Ace AceDB XML dump All in AceDB LINK
Species data Name and Taxon ID AceDB XML dump All in AceDB Species list LINK
RNAi metadata Select fields extracted from Ace AceDB XML dump All in AceDB LINK

What is this data mapping?

A loading program plugin has been created for InterMine which extracts data embedded in XML files directly into an InterMine instance. Mapping files are used to configure this program and detail the AceDB XML dumps to InterMine translation. XPath is used to query the XML, and can be reviewed here.

Understanding our model

The data contained in WormMine follows a central model schema. This model should be understood sufficiently to be able to query the data and create templates.

WormMine model file

This schema file contains all of the data types contained in WormMine, relationships between them, and each one's data fields.

How to read it

Looking at the protein class:

   <class name="Protein" extends="BioEntity" is-interface="true">
       <attribute  name="molecularWeight" type="java.lang.Float"/>
       <attribute  name="md5checksum" type="java.lang.String"/>
       <attribute  name="length" type="java.lang.Integer"/>
       <attribute  name="geneName" type="java.lang.String"/>
       <attribute  name="primaryAccession" type="java.lang.String"/>
       <reference  name="sequence" referenced-type="Sequence"/>
       <collection name="CDSs" referenced-type="CDS" reverse-reference="protein"/>
       <collection name="genes" referenced-type="Gene" reverse-reference="proteins"/>
       <collection name="transcripts" referenced-type="Transcript" reverse-reference="protein"/>
   </class>

Line by line:

<class name="Protein" extends="BioEntity" is-interface="true">

extends="BioEntity": Protein is a child of BioEntity therefore it inherits BioEntity's data fields.

Protein's parent, BioEntity:

   <class name="BioEntity" is-interface="true">
       <attribute name="secondaryIdentifier" type="java.lang.String"/>
       <attribute name="symbol" type="java.lang.String"/>
       <attribute name="primaryIdentifier" type="java.lang.String"/>
       <attribute name="lastUpdated" type="java.util.Date"/>
       <attribute name="name" type="java.lang.String"/>
       <reference name="organism" referenced-type="Organism"/>
       <collection name="synonyms" referenced-type="Synonym" reverse-reference="subject"/>
       <collection name="publications" referenced-type="Publication" reverse-reference="bioEntities"/>
       <collection name="ontologyAnnotations" referenced-type="OntologyAnnotation" reverse-reference="subject"/>
       <collection name="phenotypesObserved" referenced-type="Phenotype" reverse-reference="observedIn"/>
       <collection name="phenotypesNotObserved" referenced-type="Phenotype" reverse-reference="notObservedIn"/>
       <collection name="crossReferences" referenced-type="CrossReference" reverse-reference="subject"/>
       <collection name="dataSets" referenced-type="DataSet" reverse-reference="bioEntities"/>
       <collection name="locatedFeatures" referenced-type="Location" reverse-reference="locatedOn"/>
       <collection name="locations" referenced-type="Location" reverse-reference="feature"/>
   </class> 

Protein contains copies of all these attributes, references, and collections for itself. If BioEntity inherits any fields itself, those are included as well.

<attribute name="primaryAccession" type="java.lang.String"/>: This creates an attribute of protein called primaryAccession (read primary accession) which is a string (word(s)). This line enables every protein object to hold a primaryAccession value in addition to any children which may inherit from it.

<reference name="sequence" referenced-type="Sequence"/>: This creates a reference named "sequence" to another data type, in this case Sequence. Only one sequence object can be referenced this way at a time. reverse-reference attributes may appear here, which matches the reciprocal relationship in the referenced type if one exists.

<collection name="CDSs" referenced-type="CDS" reverse-reference="protein"/>: CDSs collection. It can hold many references to CDSs, which in return are stored in the CDS "protein" (CDS.protein) field.

Using QueryBuilder

Querybuilder link.png

Use case: How do I find all proteins for a gene?

How are they intuitively connected? Genes are transcribed into transcripts which contain coding sequences which are translated into proteins. Keep this in mind while constructing a query.

  • Select "Gene" in the box under "Select a Data Type to Begin a Query"
  • Click "Select", or double click "Gene"

This list represents the Gene data available to display. Various fields can be chosen to be shown in the results table, links to other data types followed, and filters can be set to constrain the results.

Desired results?

We want to display the symbols of each involved object, plus the gene ID. This calls for:

  1. Gene.WormBase Gene ID
  2. Gene.Gene Name
  3. Gene (some relationship(s))-> Transcript.Sequence Name
  4. Gene (some relationship(s))-> CDS.Sequence Name
  5. Gene (some relationship(s))-> Protein.Name

Building the query

click these areas

Click *SHOW* next to any of these field names to add that attribute to the resulting table as a column.

  • Show the "WormBase Gene ID" and "Gene Name"
  • Gene contains a collection of transcripts, this relationship represents transcription. Expand the relationship by clicking on the [+]



  • Follow these steps for each of the data types in the chain as illustrated:
Tscript expanded2 new2.png
CDS expanded new2.png
Protein expanded new2.png



















  • Running the query at this point will give you results for all genes. If you have a smaller set in mind, like restricting the genes to egl-19 only, a constraint must be set.
  • Click "constrain" next to the "Gene.Gene Name" field (circle #2), type "egl-19" in the text box, make sure the operator is "=", then "add to query".


Constraint Dialog Box for egl-19.png












  • Your query overview should resemble this:


QB egl-19 new.png


  • Show results will execute the completed query.


constraining to a list

If you have access to any lists, the constraint dialog box will provide options with respect to them. Fields can be constrained such that their values must or mustn't be a member of that list.

Creating templates

Template queries are predefined (canned) queries. A template query can address a specific question and it can also be a good jump off point for refinements and for constructing queries that answer related questions. To create a template:

* Log in
* Construct a query using Query Builder
* Must include at least one constrain condition (e.g. restrict the format of the identifier)
* "Start building a template query"
* Fill in Name, Title and Description, and optionally comment
* Make necessary adjustments, then "Save template"
* Admin users can make a personal template public by
  My Mine -> Templates -> Add Tags -> New tag, type in the text box "im:public".
  Add one tag at a time. 
  Tags help to group lists.
  For the list to be viewable by the Public, add a tag "im:public".    For the list to be grouped under the "aspect" categories on the Home page, add the two tags: "im:frontpage" and "im:aspect:ACPECTNAME"
  The capitalisation of the ACPECTNAME that you use in the above tag is important. The word ASPECTNAME in the above tag should be replaced by one of Genomics, Proteins, Expression, Genetic Variations, Phenotypes, Gene Ontology, Strains
  Hit "Create". (This tag should be available for the next template you create).
  The Name field text should start with your initials to identify you as the author e.g. "XYZ Gene Homology"
  The Title field text is what the users see and should include '-->' which is converted into am arrow icon on the final template list on the Home page.

Creating lists

A list is a collection of identifiers of the same type of entities (e.g. genes, proteins, body parts). Lists can be compared and combined and be used as the starting points of queries. Sources of lists may be external (from a third party resource) or from WormMine. To generate a list and make it public:

1. Log in.
2. Make a query (Query Builder).
3. On the Result page, top right corner, select "Create / Add to List" -> "Create New List" -> "All of Columns ...", or
   "Choose individual items from the table". "Choose individual items from the table" allows further refinement.
4. Provide for the list: a Name, an informative Description, and Tags. 
   The Name should start with your initials to identify you as the author e.g. "XYZ Gene Homology"
5. Descriptions and tags can also be edited after a list is saved.

Internal Report Page

find intermine internal id via a query (must be logged in as a super user)

http://206.108.125.166:8080/wormmine/report.do?id=

Other Sites

Tutorial on making templates http://www.yeastgenome.org/help/video-tutorials/yeastmine