Genotype

From WormBaseWiki
Revision as of 16:31, 16 June 2020 by Rkishore (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Rationale

To enable curators (disease curators and then maybe phenotype curators) to annotate directly to entire genotypes, the ?Genotype class has been proposed. The decision was made that any ?Genotype object should reflect the entire genotype of a particular worm or strain of worms, to the extent that it is known, rather than partial genotypes. The initial data model allows for direct associations to disease model annotations (curated in the disease OA), whereby genotype associations to diseases (DO terms) would then be made during the Postgres curation database dump.

Issues

Action items

  • For future:
    • Automatically generate Genotype names according to nomenclature guidelines based on "Genotype_component" tag entries
      • Note: zygosity will not be accounted for unless zygosity is added back in to model
    • Build database pipeline scripts to add automatically inferred gene associations from variation-to-gene mapping pipeline output
    • Add in strain references in ?Genotype model and either (A) co-opt existing "Genotype" tag in ?Strain model to point to ?Genotype objects instead of free text (would require transforming all existing free text entries into ?Genotype objects) and/or (B) supplement ?Strain model with additional "Genotype" tag to point to ?Genotype objects while possibly changing "Genotype" tag for free text to "Genotype_text"
    • If necessary (if use cases exist), add back in zygosity and/or maternal/paternal genotype references

?Genotype model

Initial proposal:

?Genotype	Genotype_name  UNIQUE  ?Text		//  e.g. “unc-1(e103);unc-2(e234)”; eventually to be automatically generated
                Genotype_synonym   ?Text    // To capture literal representations in the literature
                Genotype_component	Gene	?Gene	  XREF  Component_of_genotype  #Evidence
                                        Variation ?Variation	XREF	Component_of_genotype
		                        Rearrangement	?Rearrangement  XREF	Component_of_genotype
		                        Transgene	?Transgene  XREF  Component_of_genotype
		                        Other_component	UNIQUE  ?Text  // Free text components including RNAi
		Disease_info  Models_disease  ?DO_term  XREF  Disease_model_genotype	
			      Modifies_disease  ?DO_term  XREF  Disease_modifier_genotype
			      Models_disease_in_annotation  ?Disease_model_annotation  XREF  Genotype
    			      Modifies_disease_in_annotation  ?Disease_model_annotation  XREF  Modifier_genotype
		Species	UNIQUE ?Species
		Remark	?Text	                           
		Reference	?Paper	XREF	Genotype

Model issues

  • Model proposal, including proposed changes to other classes, on this Google Doc
  • Database build will need to populate the "Gene" tag with genes inferred from the variation-to-gene mapping pipeline, if they are not already manually populated by the curator/Postgres
    • Genes inferred automatically by the build process should get an #Evidence hash entry of "Inferred_automatically"
  • The "Gene" tag will get automatically populated by the OA dumper for ?Rearrangement and ?Transgene objects
    • Genes referenced in ?Rearrangement objects' "Gene_inside" tag will populate the "Gene" tag in the ?Genotype object
    • Genes referenced in a ?Transgene object's corresponding ?Construct object's (from "Construct" and "Coinjection" tags) "Driven_by_gene", "Gene" and "3_UTR" tags will populate the "Gene" tag in the ?Genotype object
  • Disease assertions (and relevant paper connections) will be dumped by the Disease Model Annotation OA


Genotype Postgres Tables

  • gno_identifier
  • gno_curator
  • gno_name
  • gno_synonym
  • gno_gene
  • gno_variation
  • gno_rearrangement
  • gno_transgene
  • gno_othercomp
  • gno_species
  • gno_remark
  • gno_paper
  • gno_nodump

Genotype OA

  • pgid - no postgres table - the postgres ID - NOT DUMPED
  • ID - gno_identifier - Genotype primary ID #Note: This is automatically assigned (format: WBGenotype00000001)
  • Curator - gno_curator - NOT DUMPED - Curator (Dropdown)
  • Name - gno_name - Genotype_name - Big text
  • Synonym - gno_synonym - Genotype_synonym - Big Text (multiple entries bar "|" separated)
  • Gene - gno_gene - Gene - ?Gene (Multi-ontology)
  • Variation - gno_variation - Variation - ?Variation (Multi-ontology)
  • Rearrangement - gno_rearrangement - Rearrangement - ?Rearrangement (Multi-ontology)
  • Transgene - gno_transgene - Transgene - ?Transgene (Multi-ontology)
  • Other Component - gno_othercomp - Other_component - Big text
  • Species - gno_species - Species - ?Species (Dropdown)
  • Remark - gno_remark - Remark - Big text
  • Paper - gno_paper - Reference - ?Paper (Multi-ontology)
  • NO DUMP - gno_nodump - NOT DUMPED - toggle

Dumping genotypes for data upload

  • Genotypes will go in for the first time for WS278
  • Dumping script at: /home/postgres/work/citace_upload/gno_genotype/use_package.pl
  • Symlink to a curator's directory and run from there