Difference between revisions of "Genotype"

From WormBaseWiki
Jump to navigationJump to search
m
 
(27 intermediate revisions by 3 users not shown)
Line 2: Line 2:
 
==Rationale==
 
==Rationale==
  
To enable curators (disease curators and then maybe phenotype curators) to annotate directly to entire genotypes, the ?Genotype class has been proposed. The decision was made that any ?Genotype object should reflect the entire genotype of a particular worm or strain of worms, to the extent that it is known, rather than partial genotypes. The initial data model allows for direct associations to disease model annotations, whereby genotype associations to diseases (DO terms) would then be made during the Postgres curation database dump.
+
To enable curators (disease curators and then maybe phenotype curators) to annotate directly to entire genotypes, the ?Genotype class has been proposed. The decision was made that any ?Genotype object should reflect the entire genotype of a particular worm or strain of worms, to the extent that it is known, rather than partial genotypes. The initial data model allows for direct associations to disease model annotations (curated in the disease OA), whereby genotype associations to diseases (DO terms) would then be made during the Postgres curation database dump.
  
 
==Issues==
 
==Issues==
  
 
==Action items==
 
==Action items==
 
+
* For future:
 +
** Automatically generate Genotype names according to [https://wormbase.org/about/userguide/nomenclature#c--10 nomenclature guidelines] based on "Genotype_component" tag entries
 +
*** Note: zygosity will not be accounted for unless zygosity is added back in to model
 +
** Build database pipeline scripts to add automatically inferred gene associations from variation-to-gene mapping pipeline output
 +
** Add in strain references in ?Genotype model and either (A) co-opt existing "Genotype" tag in ?Strain model to point to ?Genotype objects instead of free text (would require transforming all existing free text entries into ?Genotype objects) and/or (B) supplement ?Strain model with additional "Genotype" tag to point to ?Genotype objects while possibly changing "Genotype" tag for free text to "Genotype_text"
 +
** If necessary (if use cases exist), add back in zygosity and/or maternal/paternal genotype references
  
 
== ?Genotype model ==
 
== ?Genotype model ==
Line 14: Line 19:
  
 
<pre>
 
<pre>
?Genotype Genotype_name  UNIQUE  ?Text //  e.g. “unc-1(e103);unc-2(e234)”
+
?Genotype Genotype_name  UNIQUE  ?Text //  e.g. “unc-1(e103);unc-2(e234)”; eventually to be automatically generated
 +
                Genotype_synonym  ?Text    // To capture literal representations in the literature
 
                 Genotype_component Gene ?Gene   XREF  Component_of_genotype  #Evidence
 
                 Genotype_component Gene ?Gene   XREF  Component_of_genotype  #Evidence
                                         Variation ?Variation XREF Component_of_genotype UNIQUE Text // Zygosity
+
                                         Variation ?Variation XREF Component_of_genotype
                        Rearrangement ?Rearrangement  XREF Component_of_genotype UNIQUE Text // Zygosity
+
                        Rearrangement ?Rearrangement  XREF Component_of_genotype
                        Transgene ?Transgene  XREF  Component_of_genotype UNIQUE Text // Zygosity
+
                        Transgene ?Transgene  XREF  Component_of_genotype
                        // Zygosity text entry should be one of the following:
 
                        // Homozygous, Heterozygous_with_wildtype, or Heteroallelic_combination
 
 
                        Other_component UNIQUE  ?Text  // Free text components including RNAi
 
                        Other_component UNIQUE  ?Text  // Free text components including RNAi
 
Disease_info  Models_disease  ?DO_term  XREF  Disease_model_genotype
 
Disease_info  Models_disease  ?DO_term  XREF  Disease_model_genotype
 
      Modifies_disease  ?DO_term  XREF  Disease_modifier_genotype
 
      Modifies_disease  ?DO_term  XREF  Disease_modifier_genotype
      Disease_model_annotation  Model_genotype ?Disease_model_annotation  XREF  Genotype
+
      Models_disease_in_annotation ?Disease_model_annotation  XREF  Genotype
      Modifier_genotype ?Disease_model_annotation  XREF  Modifier_genotype
+
          Modifies_disease_in_annotation ?Disease_model_annotation  XREF  Modifier_genotype
 
Species UNIQUE ?Species
 
Species UNIQUE ?Species
 
Remark ?Text                            
 
Remark ?Text                            
 
Reference ?Paper XREF Genotype
 
Reference ?Paper XREF Genotype
 
</pre>
 
</pre>
 +
 +
== Model issues ==
 +
* Model proposal, including proposed changes to other classes, on [https://docs.google.com/document/d/19hP9r6BpPW3FSAeC_67FNyNq58NGp4eaXBT42Ch3gDE/edit?usp=sharing this Google Doc]
 +
* Database build will need to populate the "Gene" tag with genes inferred from the variation-to-gene mapping pipeline, if they are not already manually populated by the curator/Postgres
 +
** Genes inferred automatically by the build process should get an #Evidence hash entry of "Inferred_automatically"
 +
* The "Gene" tag will get automatically populated by the OA dumper for ?Rearrangement and ?Transgene objects
 +
** Genes referenced in ?Rearrangement objects' "Gene_inside" tag will populate the "Gene" tag in the ?Genotype object
 +
** Genes referenced in a ?Transgene object's corresponding ?Construct object's (from "Construct" and "Coinjection" tags) "Driven_by_gene", "Gene" and "3_UTR" tags will populate the "Gene" tag in the ?Genotype object
 +
* Disease assertions (and relevant paper connections) will be dumped by the Disease Model Annotation OA
  
  
 
== Genotype Postgres Tables ==
 
== Genotype Postgres Tables ==
 +
*gno_identifier
 +
*gno_curator
 +
*gno_name
 +
*gno_synonym
 +
*gno_gene
 +
*gno_variation
 +
*gno_rearrangement
 +
*gno_transgene
 +
*gno_othercomp
 +
*gno_species
 +
*gno_remark
 +
*gno_paper
 +
*gno_nodump
  
 +
== Genotype OA ==
 +
*pgid - no postgres table - the postgres ID - NOT DUMPED
 +
*ID - gno_identifier - Genotype primary ID #Note: This is automatically assigned (format: WBGenotype00000001)
 +
*Curator - gno_curator - NOT DUMPED - Curator (Dropdown)
 +
*Name - gno_name - Genotype_name - Big text
 +
*Synonym - gno_synonym - Genotype_synonym - Big Text (multiple entries bar "|" separated)
 +
*Gene - gno_gene - Gene - ?Gene (Multi-ontology)
 +
*Variation - gno_variation - Variation - ?Variation  (Multi-ontology)
 +
*Rearrangement - gno_rearrangement - Rearrangement - ?Rearrangement  (Multi-ontology)
 +
*Transgene - gno_transgene - Transgene - ?Transgene  (Multi-ontology)
 +
*Other Component - gno_othercomp - Other_component - Big text
 +
*Species - gno_species - Species - ?Species (Dropdown)
 +
*Remark - gno_remark - Remark - Big text
 +
*Paper - gno_paper - Reference - ?Paper (Multi-ontology)
 +
*NO DUMP - gno_nodump - NOT DUMPED - toggle
  
== Genotype OA ==
+
==Dumping genotypes for data upload==
*pgid - the postgres ID - NOT DUMPED
+
*Genotypes will go in for the first time for WS278
*ID - gen_identifier - Genotype primary ID #Note: This is automatically assigned
+
*Dumping script at: /home/postgres/work/citace_upload/gno_genotype/use_package.pl
*Curator - gen_curator - NOT DUMPED - Curator (Dropdown)
+
*Symlink to a curator's directory and run from there
*Name - gen_name - Name - Big text
 
*Gene - gen_gene - Gene - ?Gene (Ontology)
 
*Variation (Unknown Zygosity) - gen_variation - Variation - ?Variation  (Multi-ontology)
 
*Homozygous Variation - gen_homo_variation - Variation - ?Variation  Homozygous  (Multi-ontology)
 
*Heterozygous (w/WT) Variation - gen_hetwt_variation - Variation - ?Variation  Heterozygous_with_wildtype  (Multi-ontology)
 
*Heteroallelic Variation - gen_hetal_variation - Variation - ?Variation  Heteroallelic_combination  (Multi-ontology)
 
*Rearrangement (Unknown Zygosity) - gen_rearrangement - Rearrangement - ?Rearrangement  (Multi-ontology)
 
*Homozygous Rearrangement - gen_homo_rearrangement - Rearrangement - ?Rearrangement  Homozygous  (Multi-ontology)
 
*Heterozygous (w/WT) Rearrangement - gen_hetwt_rearrangement - Rearrangement - ?Rearrangement  Heterozygous_with_wildtype  (Multi-ontology)
 
*Heteroallelic Rearrangement - gen_hetal_rearrangement - Rearrangement - ?Rearrangement  Heteroallelic_combination  (Multi-ontology)
 
*Transgene (Unknown Zygosity) - gen_transgene - Transgene - ?Transgene  (Multi-ontology)
 
*Homozygous Transgene - gen_homo_transgene - Transgene - ?Transgene  Homozygous  (Multi-ontology)
 
*Heterozygous (w/WT) Transgene - gen_hetwt_transgene - Transgene - ?Transgene  Heterozygous_with_wildtype  (Multi-ontology)
 
*Heteroallelic Transgene - gen_hetal_transgene - Transgene - ?Transgene  Heteroallelic_combination  (Multi-ontology)
 
*Other Component - gen_othercomp - Other_component - Big text
 
*Species - gen_species - Species - ?Species (Dropdown)
 
*Remark - gen_remark - Remark - Big text
 
*Paper - gen_paper - Reference - ?Paper (Ontology)
 

Latest revision as of 16:31, 16 June 2020

Rationale

To enable curators (disease curators and then maybe phenotype curators) to annotate directly to entire genotypes, the ?Genotype class has been proposed. The decision was made that any ?Genotype object should reflect the entire genotype of a particular worm or strain of worms, to the extent that it is known, rather than partial genotypes. The initial data model allows for direct associations to disease model annotations (curated in the disease OA), whereby genotype associations to diseases (DO terms) would then be made during the Postgres curation database dump.

Issues

Action items

  • For future:
    • Automatically generate Genotype names according to nomenclature guidelines based on "Genotype_component" tag entries
      • Note: zygosity will not be accounted for unless zygosity is added back in to model
    • Build database pipeline scripts to add automatically inferred gene associations from variation-to-gene mapping pipeline output
    • Add in strain references in ?Genotype model and either (A) co-opt existing "Genotype" tag in ?Strain model to point to ?Genotype objects instead of free text (would require transforming all existing free text entries into ?Genotype objects) and/or (B) supplement ?Strain model with additional "Genotype" tag to point to ?Genotype objects while possibly changing "Genotype" tag for free text to "Genotype_text"
    • If necessary (if use cases exist), add back in zygosity and/or maternal/paternal genotype references

?Genotype model

Initial proposal:

?Genotype	Genotype_name  UNIQUE  ?Text		//  e.g. “unc-1(e103);unc-2(e234)”; eventually to be automatically generated
                Genotype_synonym   ?Text    // To capture literal representations in the literature
                Genotype_component	Gene	?Gene	  XREF  Component_of_genotype  #Evidence
                                        Variation ?Variation	XREF	Component_of_genotype
		                        Rearrangement	?Rearrangement  XREF	Component_of_genotype
		                        Transgene	?Transgene  XREF  Component_of_genotype
		                        Other_component	UNIQUE  ?Text  // Free text components including RNAi
		Disease_info  Models_disease  ?DO_term  XREF  Disease_model_genotype	
			      Modifies_disease  ?DO_term  XREF  Disease_modifier_genotype
			      Models_disease_in_annotation  ?Disease_model_annotation  XREF  Genotype
    			      Modifies_disease_in_annotation  ?Disease_model_annotation  XREF  Modifier_genotype
		Species	UNIQUE ?Species
		Remark	?Text	                           
		Reference	?Paper	XREF	Genotype

Model issues

  • Model proposal, including proposed changes to other classes, on this Google Doc
  • Database build will need to populate the "Gene" tag with genes inferred from the variation-to-gene mapping pipeline, if they are not already manually populated by the curator/Postgres
    • Genes inferred automatically by the build process should get an #Evidence hash entry of "Inferred_automatically"
  • The "Gene" tag will get automatically populated by the OA dumper for ?Rearrangement and ?Transgene objects
    • Genes referenced in ?Rearrangement objects' "Gene_inside" tag will populate the "Gene" tag in the ?Genotype object
    • Genes referenced in a ?Transgene object's corresponding ?Construct object's (from "Construct" and "Coinjection" tags) "Driven_by_gene", "Gene" and "3_UTR" tags will populate the "Gene" tag in the ?Genotype object
  • Disease assertions (and relevant paper connections) will be dumped by the Disease Model Annotation OA


Genotype Postgres Tables

  • gno_identifier
  • gno_curator
  • gno_name
  • gno_synonym
  • gno_gene
  • gno_variation
  • gno_rearrangement
  • gno_transgene
  • gno_othercomp
  • gno_species
  • gno_remark
  • gno_paper
  • gno_nodump

Genotype OA

  • pgid - no postgres table - the postgres ID - NOT DUMPED
  • ID - gno_identifier - Genotype primary ID #Note: This is automatically assigned (format: WBGenotype00000001)
  • Curator - gno_curator - NOT DUMPED - Curator (Dropdown)
  • Name - gno_name - Genotype_name - Big text
  • Synonym - gno_synonym - Genotype_synonym - Big Text (multiple entries bar "|" separated)
  • Gene - gno_gene - Gene - ?Gene (Multi-ontology)
  • Variation - gno_variation - Variation - ?Variation (Multi-ontology)
  • Rearrangement - gno_rearrangement - Rearrangement - ?Rearrangement (Multi-ontology)
  • Transgene - gno_transgene - Transgene - ?Transgene (Multi-ontology)
  • Other Component - gno_othercomp - Other_component - Big text
  • Species - gno_species - Species - ?Species (Dropdown)
  • Remark - gno_remark - Remark - Big text
  • Paper - gno_paper - Reference - ?Paper (Multi-ontology)
  • NO DUMP - gno_nodump - NOT DUMPED - toggle

Dumping genotypes for data upload

  • Genotypes will go in for the first time for WS278
  • Dumping script at: /home/postgres/work/citace_upload/gno_genotype/use_package.pl
  • Symlink to a curator's directory and run from there