OA-phenotype
Contents
package dump script
tazendra /home/acedb/work/allele_phenotype/use_package.pl
use_package.pl is a perl script that uses the perl module( ), to generate an error file and two .ace files (since May 2011)- one for phenotypes and one for molecule-phenotype.
Constraints:
- does not dump record when value in Curation status is "down right disgusted", this acts as our no dump toggle
- when there is no value for phenotype -this will stop non-curated NBP data from being dumped
- will dump if phenotype is present regardless of curator
requested changes
5/17/2011
- add a constraint - for rearrangement objects, dump all phenotype information except molecule values (molecule data should not be annotated for rearrangements so this is just a fail safe in case it does happen).
- in addition to the varphene.ace output file, please output a separate file called mol_phene.ace, which contains:
- molecule variation phenotype
- molecule strain phenotype
- molecule transgene phenotype
2/2012
I can't tell if all 5 of these are new fields, or just some. They are all new fields. 5 new tables + fields added -- J
If they're all fields, are they all at the bottom of tab2, or just the first 2 fields ? Yes, add them all to tab 2 bottom
Are you sure you want those field names, they're really long (which is fine by me, but will take up more space for you) -- J Good point, I shortened them
- Add fields to OA - add to TAB2 at the bottom in the following order
- (NEW FIELD) rescued by - multi-ontology from transgene tables, autocomplete on transgene name app_rescuedby
- (NEW FIELD) legacy info - app_legacyinfo parsed data from legacy data- all entries with [celegans] from file on tazendra and mangolassi at /home/acedb/work/allele_phenotype This file ? /home/acedb/work/allele_phenotype/legacy_information.txt There are no entries with "[celegans]" Do you mean lines where it says --;"[C.elegansII]-- ? yes
- What do you mean by parse ? Enter everything in each line into a new app_ OA line with its own pgid ? Or split on ";" and only enter stuff from the third column ? or something else ? -- J take everything in quotes starting with the third column where there is "[C. elegansII], in some cases there are more semi-colons, these will need to be ignored after the third column
Do you mean that there are 4th and 5th+ columns, or that the third column contains semicolons ? The divider is --";"-- with the doublequotes, so if you meant the latter, it should be okay if you meant you want it all including the semicolons in the third column. If there are 4th and 5th columns, let me know -- J Oh, there are semicolons in within the quotes, so you are saying those don't count as columns? if not, then cool, we are good. - each entry gets its own pgid
- add curation status (app_curation_status) of "down right disgusted" so lines that I have not touched do not get dumped.
- make legacy data editable text or bigtext ? -- J bigtext
- what other fields ? no app_name ? you for app_curator ? -- J
- sure, me for app_curator
- please add Jonathan Hodgkin (WBPerson261) to the person field
- sure, me for app_curator
- parsed into app_curator WBPerson712 ; app_curation_status down_right_disgusted ; app_legacyinfo "<wbgene> | <info>" (also history tables) -- J
- What do you mean by parse ? Enter everything in each line into a new app_ OA line with its own pgid ? Or split on ";" and only enter stuff from the third column ? or something else ? -- J take everything in quotes starting with the third column where there is "[C. elegansII], in some cases there are more semi-colons, these will need to be ignored after the third column
- (NEW FIELD) ES - drop down list, with values app_easescore
- "ES0_Impossible_to_score", "ES1_Very_difficult_to_score", "ES2_Difficult_to_score", "ES3_Easy_to_score"
- (NEW FIELD) ME - drop down list with values app_mmateff
- "ME0_Mating_not_successful", "ME1_Mating_rarely_successful", "ME2_Mating_usually_successful", "ME3_Mating_always_successful"
- (NEW FIELD) HME - drop down list with values app_hmateff
- "HME0_Mating_not_successful", "HME1_Mating_rarely_successful", "HME2_Mating_usually_successful", "HME3_Mating_always_successful"
CHANGES TO DUMP SCRIPT
- mating_efficiency
- constrain lines with mating_efficiency values to be NOT NULL in app_curator, app_tempname (variation), app_person OR app_paper what does constrain mean ? check_data button on OA ? so not in dumping script ? I can't find stuff like this in the dumping script. Unless it's a new thing, but I thought this was in the check_Data button. Did we ever wiki the dumping script ? -- J https://bitbucket.org/kyook/ky_wbprojects/wiki/use_package.pl we did go over the dumping script, there are some constraints (rules? you probably have another term for this) that I thought were employed, see the bit bucket page. that's just the script that calls the module, the module is what generates the data, and we need to go over that if we're going to change stuff -- J ok
- lines with mating_efficiency can be blank for phenotype (app_phenotype) Do you mean app_term ? yes, sorry
This implies they can't be blank for other stuff, the script only does some stuff for pgids with data in app_term, does it already ever dump stuff when there isn't an app_term, and this new thing should join that, or is this the first time it will do that ? If we haven't gone the dumping script, we should, this seems like a pretty big change -- J You are correct the script already has rules in it to not dump data when there is no phenotype (app_term); however, lines with mating efficiency values (ME and or HME) will need to escape that rule. I hope this is not a big change(!) well, it's not huge, but it's not just adding another line and another tag, so we really should just wiki up the module so that future changes are more clear -- J ok
- lines with mating_efficiency can be blank for phenotype (app_phenotype) Do you mean app_term ? yes, sorry
- Don't know what the tag for non-Male mating efficiency should be (using Nonmale for now) It should be Hermaphrodite
- Don't know what the tag for non-Male mating efficiency should be (using Nonmale for now) It should be Hermaphrodite
couldn't find it here : http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/wormbase/wspec/models.wrm?root=ensembl&view=markup&pathrev=WS229 Wrong release, it should be http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/wormbase/wspec/models.wrm?root=ensembl&view=markup&pathrev=WS231
- .ace should look like the example is good -- J
Variation : "WBVar00266499" Male "ME2_Mating_usually_successful" Curator_confirmed "WBPerson712" Male "ME2_Mating_usually_successful" Person_evidence "WBPerson261"
- ease_of_scoring
- .ace should be like
Variation : "WBVar00266499" Species "Caenorhabditis_elegans" Phenotype "WBPhenotype:0000456" Curator_confirmed "WBPerson712" Phenotype "WBPhenotype:0000456" Person_evidence "WBPerson261" Phenotype "WBPhenotype:0000456" Remark "touch-insensitive" Curator_confirmed "WBPerson712" Phenotype "WBPhenotype:0000456" Remark "touch-insensitive" Person_evidence "WBPerson261" Phenotype "WBPhenotype:0000456" Ease_of_scoring "ES2_Difficult_to_score" Curator_confirmed "WBPerson712" Phenotype "WBPhenotype:0000456" Ease_of_scoring "ES2_Difficult_to_score" Person_evidence "WBPerson261"
- rescued_by_transgene
- .ace should be like
Variation : "WBVar00266499" Species "Caenorhabditis_elegans" Phenotype "WBPhenotype:0000456" Curator_confirmed "WBPerson712" Phenotype "WBPhenotype:0000456" Person_evidence "WBPerson261" Phenotype "WBPhenotype:0000456" Remark "touch-insensitive" Curator_confirmed "WBPerson712" Phenotype "WBPhenotype:0000456" Remark "touch-insensitive" Person_evidence "WBPerson261" Phenotype "WBPhenotype:0000456" Rescued_by_Transgene "asIs432248" Curator_confirmed "WBPerson712" Phenotype "WBPhenotype:0000456" Rescued_by_Transgene "asIs432248" Person_evidence "WBPerson712"
legacy info
It is easiest just to do one final patch dump of updated legacy info rather than try to update it through -D and updates through each data dump .ace, so for now ignore this section until we have settled on the right course
- legacy data- needs to be mapped to corresponding gene (app_wbgene)
while the table app_wbgene still exists, it is 1) not in the OA, 2) does not have WBGene objects, it has data in what looks like a bad format : WBGene00003883 (osm-1 (WBGene00003883)) or WBGene00000058 (acr-19). I don't know where this came from, but it seems bad. This is most likely from the Variation_gene.txt I create after each build so alleles can be mapped to genes. The most recent entry is :
4606 | WBGene00002245 (lag-1 (WBGene00002245)) | 2008-01-18 20:00:41.652326-08
What do you do with that file, do you run a script that does something with it ? -- J oh, then it is probably very old, that is in Jolene's time.
If you meant that we should dump whatever is in legacy pgids to existing app_wbgene, I don't see how since the legacy pgids are new, and the app_wbgene pgids are old and you can't add data to them without going to postgres directly. no, you are correct that is not what I meant.
If you know where this data came from, let me know if it's good. If it's not good, let's back it up and get rid of the table -- J if it isn't being used then yes, by all means we should get rid of it. Can you get rid of it on mangolassi so we can see what happens? As it is an app_wbgene table it is only looked at by the phenotype OA correct? yes, I think so, but if that Variation_gene.txt is used for some scripts you run manually, or a cronjob picks it up, getting rid of it on mangolassi won't show that anything changed because you wouldn't be dumping data there or any of the other stuff (cronjobs don't run on mangolassi). The second point is, if you were not thinking of the app_wbgene table with weird data, did you mean that you want a new app_wbgene table, and that you're doing to be populating it manually as you edit the legacy information ? And if so, ontology of multiontology -- J we can talk to chris and xiaodong about how variations are getting mapped to genes during the dump, this is the function that we would need in order to generate the .ace output for mating efficiency data. if that rings some bells, then great, otherwise we will need to wait for Chris and Xiaodong. this is no longer necessary as this problem was solved by including the WBGeneID in with the legacy data, piped before the legacy info text.
Legacy data is dumped whenever there is a value in the legacy data field.
- make sure if lines are duplicated, the legacy value is deleted or the there will be more than one gene-legacy data entry in the .ace
- if all legacy data is taken care of - it is no longer needed so delete all the values in the field or blanks will be dumped and an error message will pop up.
- For the .ace
- dump data when there is no 'down right disgusted tag' this is how the script works already
- put in a -D for the legacy data and dump a legacy data line with the text in the legacy data field. Need to get original legacy data from legacy_information.txt, only information starting with "[C. elegansII]"
Gene : "WBGene00006932" -D Legacy_information "[C.elegansII] h797 : Mid-larval lethal. OA5: h351, etc. [KR]" Legacy_information "[C.elegansII] OA5: h351, etc. [KR]"
- for lines that have no legacy information but has a WBGene value | the .ace paragraph should be:
Gene : "WBGene00001177" -D Legacy_information "n488 : transient variable bloating Type E. ES2 (adult). NA1."
--kjy 21:51, 9 March 2012 (UTC)
--kjy 23:33, 13 February 2012 (UTC)
Phenotype OA postgres tables (app_*)
postgres tables | |||
app_allele_status | app_func | app_molaffectedquality_hst | app_quantity_remark_hst |
app_allele_status_hst | app_func_hst | app_molecule | app_range_end |
app_anat_term | app_genotype | app_molecule_hst | app_range_end_hst |
app_anat_term_hst | app_genotype_hst | app_nature | app_range_hst |
app_anatomy | app_go_sug | app_nature_hst | app_range_start |
app_anatomy_hst | app_go_sug_hst | app_nbp | app_range_start_hst |
app_anatomyquality | app_gocomponent | app_nbp_hst | app_rearrangement |
app_anatomyquality_hst | app_gocomponent_hst | app_needsreview | app_rearrangement_hst |
app_caused_by | app_gocomponentquality | app_needsreview_hst | app_remark |
app_caused_by_hst | app_gocomponentquality_hst | app_nodump | app_rescuedby |
app_caused_by_other | app_gofunction | app_nodump_hst | app_rescuedby_hst |
app_caused_by_other_hst | app_gofunction_hst | app_not | app_reviewedby |
app_child_of | app_gofunctionquality | app_not_complement | app_reviewedby_hst |
app_child_of_hst | app_gofunctionquality_hst | app_not_complement_hst | app_rnai_brief |
app_cold_degree | app_goprocess | app_not_hst | app_rnai_brief_hst |
app_cold_degree_hst | app_goprocess_hst | app_obj_remark | app_strain |
app_cold_sens | app_goprocessquality | app_obj_remark_hst | app_strain_hst |
app_cold_sens_hst | app_goprocessquality_hst | app_paper | app_sug_ref |
app_communitycurator | app_haplo | app_paper_hst | app_sug_ref_hst |
app_communitycurator_hst | app_haplo_hst | app_paper_remark | app_suggested |
app_communitycuratoremail | app_heat_degree | app_paper_remark_hst | app_suggested_definition |
app_communitycuratoremail_hst | app_heat_degree_hst | app_parentstrain | app_suggested_definition_hst |
app_complements | app_heat_sens | app_parentstrain_hst | app_suggested_hst |
app_complements_hst | app_heat_sens_hst | app_pat_effect | app_temperature |
app_control_isolate | app_hmateff | app_pat_effect_hst | app_temperature_hst |
app_control_isolate_hst | app_hmateff_hst | app_pathogen | app_term |
app_controlstrain | app_intx_desc | app_pathogen_hst | app_term_hst |
app_controlstrain_hst | app_intx_desc_hst | app_penetrance | app_transgene |
app_curator | app_laboratory | app_penetrance_hst | app_transgene_hst |
app_curator_hst | app_laboratory_hst | app_percent | app_treatment |
app_delivered | app_legacyinfo | app_percent_hst | app_treatment_hst |
app_delivered_hst | app_legacyinfo_hst | app_person | app_unregpaper |
app_easescore | app_lifestage | app_person_hst | app_unregpaper_hst |
app_easescore_hst | app_lifestage_hst | app_phen_remark | app_unregtransgene |
app_filereaddate | app_lifestagequality | app_phen_remark_hst | app_unregtransgene_hst |
app_filereaddate_hst | app_lifestagequality_hst | app_phenotype | app_unregvariation |
app_finalname | app_mat_effect | app_phenotype_hst | app_unregvariation_hst |
app_finalname_hst | app_mat_effect_hst | app_picture | app_variation |
app_finished_hst | app_mmateff | app_picture_hst | app_variation_hst |
app_flaggenereg | app_mmateff_hst | app_preparation_hst | app_wbgene |
app_flaggenereg_hst | app_molaffected | app_quantity | app_wbgene_hst |
app_flaggeneticintxn | app_molaffected_hst | app_quantity_hst | |
app_flaggeneticintxn_hst | app_molaffectedquality | app_quantity_remark |
Phenotype OA TAB Summary
- Updated for newest OA format 8-30-2015 (CG)
- Line format: Field Name - postgres_tablename - NOT_DUMPED - ?WBClass/Entry-type (OA Field-type)
- Notes
TAB 1
- pgid - the postgres ID - NOT DUMPED
- Curator - app_curator - ?Person (Dropdown)
- Pub - app_paper - ?Paper (Ontology)
- Person - app_person - ?Person (Ontology)
- Note: Person providing evidence for the annotation (if not published)
- Phenotype - app_term - ?Phenotype (Ontology)
- NOT - app_not - (Toggle)
- Note: triggers dump of Phenotype_not_observed in #Phenotype_info (Toggle)
- Phenotype Remark - app_phen_remark - Text (Big Text)
- Variation - app_variation - ?Variation (Ontology)
- Transgene - app_transgene - ?Transgene (Ontology)
- Strain - app_strain - ?Strain (Ontology)
- Rearrangement - app_rearrangement - ?Rearrangement (Ontology)
- Object Remark - app_obj_remark - NOT DUMPED - Text (Text)
- Allele Status - app_allele_status - NOT DUMPED - Allele status (Dropdown)
- Note: options are: new_gene_assignment, lost, other
- Caused By Gene - app_caused_by - ?Gene (Ontology)
- Caused By Other - app_caused_by_other - Text (Text)
TAB 2
- Phenotype - app_term - ?Phenotype (Ontology)
- NOT - app_not - (Toggle)
- Note: triggers dump of Phenotype_not_observed in #Phenotype_info (Toggle)
- Phenotype Remark - app_phen_remark - Text (Big Text)
- Suggested - app_suggested - NOT DUMPED - Text (Text)
- Note: field to suggest new phenotype term. Will replace whatever term(s) are currently in the Phenotype field once it is approved
- Suggested Definition - app_suggested_definition - NOT DUMPED - Text (Big Text)
- Note: definition of new suggested phenotype term
- Child Of - app_child_of - NOT DUMPED - ?Phenotype (Ontology)
- Note: list of parent phenotype term(s) for suggested phenotype
- GO Process - app_goprocess - ?GO_term (Multi-ontology)
- GO P Quality - app_goprocessquality - PATO term (Ontology)
- GO Function - app_gofunction - ?GO_term (Multi-ontology)
- GO F Quality - app_gofunctionquality - - PATO term (Ontology)
- GO Component - app_gocomponent - ?GO_term (Multi-ontology)
- GO C Quality - app_gocomponentquality - PATO term (Ontology)
TAB 3
- Phenotype - app_term - ?Phenotype (Ontology)
- NOT - app_not - (Toggle)
- Note: triggers dump of Phenotype_not_observed in #Phenotype_info (Toggle)
- Phenotype Remark - app_phen_remark - Text (Big Text)
- Anatomy - app_anatomy - ?Anatomy (Multi-ontology)
- Anatomy Quality - app_anatomyquality - PATO term (Multi-ontology)
- Life Stage - app_lifestage - Life_stage - ?Life_stage (Multi-ontology)
- Life Stage Quality - app_lifestagequality - PATO term (Multi-ontology)
- Molecule Affected - app_moleculeaffected - ?Molecule (Multi-ontology)
- Mol Aff Quality - app_molaffectedquality - PATO term (Multi-ontology)
- Affected By Molecule - app_molecule - (in-line with Phenotype) Molecule - ?Molecule (Multi-ontology)
- Affected By Pathogen - app_pathogen - ?Species (Multi-ontology)
TAB 4
- Phenotype - app_term - ?Phenotype (Ontology)
- NOT - app_not - (Toggle)
- Note: triggers dump of Phenotype_not_observed in #Phenotype_info (Toggle)
- Phenotype Remark - app_phen_remark - Text (Big Text)
- Allele Nature - app_nature -
- Functional Change - app_func -
- Temperature - app_temperature - Temperature - ?Text (Text)
- Treatment - app_treatment - Treatment description as text (Big text)
- Control Isolate - app_control_isolate - NOT DUMPED - Text (Text)
- Penetrance - app_penetrance - (Incomplete, Complete, High, Low) (Dropdown)
- Penetrance Remark - app_percent - Text describing penetrance (Big text)
- Cold Sensitive - app_cold_sens - (in-line with Phenotype) Temperature_sensitive - Toggle
- Cold Sensitive Degree - app_cold_degree - (in-line with Phenotype) Temperature_sensitive - Toggle
- Heat Sensitive - app_heat_sens - (in-line with Phenotype) Temperature_sensitive - Toggle
- Heat Sensitive Degree - app_heat_degree - (in-line with Phenotype) Temperature_sensitive - Toggle
- Maternal Effect - app_mat_effect - Tag for maternal effect (Dropdown)
- Note: options are: Strictly_maternal, With_maternal_effect, Maternal
- Paternal Effect - app_pat_effect (Toggle)
- Haploinsufficient - app_haplo (Toggle)
TAB 5
- Phenotype - app_term - ?Phenotype (Ontology)
- NOT - app_not - (Toggle)
- Note: triggers dump of Phenotype_not_observed in #Phenotype_info (Toggle)
- Phenotype Remark - app_phen_remark - Text (Big Text)
- Picture - app_picture - ?Picture (Ontology)
- Parent Strain - app_parentstrain - ?Strain (Multi-ontology)
- Genotype - app_genotype - Genotype (Big text)
- Control Strain - app_controlstrain - Text (Text)
- Rescued By - app_rescuedby - ?Transgene (Multi-ontology)
- Complements - app_complements - NOT DUMPED - ?Variation (Multi-ontology)
- Does Not Complement - app_not_complement - NOT DUMPED - ?Variation (Multi-ontology)
- Flag Genetic Intxn - app_flaggeneticintxn - NOT DUMPED (Toggle)
- Genetic Intx Desc - app_intx_desc - NOT DUMPED - Text (Big text)
- Flag Gene Reg - app_flaggenereg - NOT DUMPED (Toggle)
TAB 6
- Phenotype - app_term - ?Phenotype (Ontology)
- NOT - app_not - (Toggle)
- Note: triggers dump of Phenotype_not_observed in #Phenotype_info (Toggle)
- Phenotype Remark - app_phen_remark - Text (Big Text)
- NBP - app_nbp - NOT DUMPED - Text (Big text)
- NBP / File Date - app_filereaddate - NOT DUMPED - Text (Text)
- Laboratory Evidence - app_laboratory - ?Laboratory (Multi-ontology)
- Legacy Info - app_legacyinfo - NOT DUMPED - Text (Big text)
- ES - app_easescore - Ease of scoring (Dropdown)
- ME - app_mmateff - Dumped in ?Variation object - Male mating efficiency (Dropdown)
- HME - app_hmateff - Dumped in ?Variation object - Hermaphrodite mating efficiency (Dropdown)
- Community Curator - app_communitycurator - ?Person (Ontology)
- Community Curator Email - app_communitycuratoremail - Text (Text)
- Unregistered Paper - app_unregpaper - NOT DUMPED - Text (Text)
- Unregistered Variation - app_unregvariation - NOT DUMPED - Text (Text)
- NO DUMP - app_nodump - NOT DUMPED (Toggle)
- Needs Review - app_needsreview - NOT DUMPED (Toggle)