Difference between revisions of "Detailed Documentation of Form and Scripts"
From WormBaseWiki
Jump to navigationJump to search(45 intermediate revisions by the same user not shown) | |||
Line 10: | Line 10: | ||
*ccc.cgi | *ccc.cgi | ||
**This is the code for the curation form. | **This is the code for the curation form. | ||
− | **[[ccc.cgi | + | **See the [[User_Guide_for_Curators]] for documentation of how the form works from a users' perspective. |
+ | **See the [[ccc.cgi documentation]] for more specific details about the code. | ||
*ccc.js | *ccc.js | ||
**This is the ccc form javascript code. | **This is the ccc form javascript code. | ||
Line 25: | Line 26: | ||
**This directory contains: | **This directory contains: | ||
***accession file - This is a mapping file that contains the mappings between WBPaper IDs and PMIDs as well as TAIR doc IDs and PMIDs. Note that we need PMIDs to send annotations to Protein2GO. | ***accession file - This is a mapping file that contains the mappings between WBPaper IDs and PMIDs as well as TAIR doc IDs and PMIDs. Note that we need PMIDs to send annotations to Protein2GO. | ||
− | ***create_ccc_pgcuration.pl - This perl script creates two postgres tables: | + | ***create_ccc_pgcuration.pl - This perl script creates two postgres tables for a given Textpresso source file: |
creates these two tables: | creates these two tables: | ||
ccc_sentenceclassification | ccc_sentenceclassification | ||
Line 55: | Line 56: | ||
ccc_curator text, | ccc_curator text, | ||
ccc_timestamp | ccc_timestamp | ||
+ | *create_ccc_pgindices.pl - This script creates the following two tables for each Textpresso source file: | ||
+ | **ccc_geneprodindex - this table contains the list of gene products mentioned in the sentence mapped to a MOD ID and a UniProtKB ID. | ||
+ | ***ccc_mod ccc_file ccc_paper ccc_section ccc_sentnum ccc_'table' ccc_timestamp | ||
+ | **ccc_componentindex - this table lists, for each sentence, the cellular components that matched the Textpresso cellular component category | ||
+ | ***ccc_mod ccc_file ccc_paper ccc_section ccc_sentnum ccc_'table' ccc_timestamp | ||
+ | *gpi files - these files contain gene names, synonyms, and MOD and UniProtKB identifiers. For file format specifications, see: | ||
+ | [http://wiki.geneontology.org/index.php/Final_GPAD_and_GPI_file_format GO's gpi file format specification] | ||
+ | **dictyBase_07032013.gpi '''Also change this file name to dicty_gpi for simplicity? | ||
+ | **TAIR1_gpi | ||
+ | **worm_gpi '''This file name needs to be changed from ws238_gpi, but I don't have permission to do this.''' | ||
+ | *meh - this looks like a test file for ccc_geneprodindex for TAIR. | ||
+ | *old_tables - this is a file that lists the names of the tables used for the previous version of the CCC curation forms. | ||
+ | *out - this looks like another test file for ccc_geneprodindex for TAIR. | ||
+ | *populate_ccc_pg_indices.pg.tair1 and populate_ccc_pg_indices.pg.worm1 - these look like the output files for the populate_ccc_pg_indices.pl for TAIR and WB. Note that for TAIR there are some sentences that were not processed properly. | ||
+ | *populate_ccc_pg_indices.pl - this script populates the two tables: ccc_component index and ccc_geneprodindex | ||
+ | **Inputs needed: | ||
+ | ***Textpresso source files | ||
+ | ***Mapping file of PMID to MOD Accession for WB and TAIR only (so far). dictyBase docIDs are the same as the PMIDs. | ||
+ | ****'''This is available from a textpresso-dev URL that needs to be updated with each search run.''' | ||
+ | ***gpi files - see above. '''Need to establish where the updated files we be located (i.e., where I can put them) and also update the names of the variables in the script to be more generic.''' This script maps the gene product names and/or synonyms used to MOD and all UniProtKB IDs. | ||
+ | **This script takes the raw Textpresso output and generates a human readable version of the sentences, as well as creating mappings of the gene product names or synonyms to IDs and generating a file with paper titles and abstracts for display on the curation form. | ||
+ | *source | ||
+ | **This directory contains directories for each MOD that has a CCC implementation. | ||
+ | **In each MOD's directories are the source files from Textpresso searches and the pmid_data file that maps PMIDs to MOD paper identifiers as well as paper titles and abstracts. | ||
+ | *test.html - | ||
+ | *ws234_tablemaker_info - the results of the tablemaker query for gene names and status for creating a gpi file. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
Back to [[WormBase]] | Back to [[WormBase]] |
Latest revision as of 15:11, 18 February 2014
Currently, on mangolassi the ccc.cgi and other scripts and files are here:
azurebrd/public_html/cgi-bin/forms/ccc
- accession
- This is a file that maps WB WBPaper IDs to PMIDs.
- For WB, this file is generated each time the search is performed.
- ccc_celegans_2013only
- I believe these are old test files that can be deleted.
- ccc.cgi
- This is the code for the curation form.
- See the User_Guide_for_Curators for documentation of how the form works from a users' perspective.
- See the ccc.cgi documentation for more specific details about the code.
- ccc.js
- This is the ccc form javascript code.
- If we want to change the number of characters needed to begin autocomplete, we can do that here.
- c_elegans.WS234.xrefs.txt and c_elegans.WS236.xrefs.txt
- These are files generated with each WB build that were used to create the WB gpi file.
- generate_gpi.pl
- This is the script that will be used to manually generate a new gpi file for WB.
- jquery
- This directory contains...
- notes
- This file contains a short bit about mapping WBGene IDs to UniProtKB accessions for the gpi file.
- scripts
- This directory contains:
- accession file - This is a mapping file that contains the mappings between WBPaper IDs and PMIDs as well as TAIR doc IDs and PMIDs. Note that we need PMIDs to send annotations to Protein2GO.
- create_ccc_pgcuration.pl - This perl script creates two postgres tables for a given Textpresso source file:
- This directory contains:
creates these two tables:
ccc_sentenceclassification ccc_mod text, ccc_file text, ccc_paper text, ccc_section text, ccc_sentnum text, ccc_sentenceclassification text, ccc_comment text, ccc_curator text, ccc_timestamp text,
ccc_sentenceannotation ccc_mod text, ccc_file text, ccc_paper text, ccc_section text, ccc_sentnum text, ccc_geneproduct text, ccc_component text, ccc_goterm text, ccc_evidencecode text, ccc_with text, ccc_alreadycurated text, ccc_comment text, ccc_valid text, ccc_ptgoid text, ccc_curator text, ccc_timestamp
- create_ccc_pgindices.pl - This script creates the following two tables for each Textpresso source file:
- ccc_geneprodindex - this table contains the list of gene products mentioned in the sentence mapped to a MOD ID and a UniProtKB ID.
- ccc_mod ccc_file ccc_paper ccc_section ccc_sentnum ccc_'table' ccc_timestamp
- ccc_componentindex - this table lists, for each sentence, the cellular components that matched the Textpresso cellular component category
- ccc_mod ccc_file ccc_paper ccc_section ccc_sentnum ccc_'table' ccc_timestamp
- ccc_geneprodindex - this table contains the list of gene products mentioned in the sentence mapped to a MOD ID and a UniProtKB ID.
- gpi files - these files contain gene names, synonyms, and MOD and UniProtKB identifiers. For file format specifications, see:
GO's gpi file format specification
- dictyBase_07032013.gpi Also change this file name to dicty_gpi for simplicity?
- TAIR1_gpi
- worm_gpi This file name needs to be changed from ws238_gpi, but I don't have permission to do this.
- meh - this looks like a test file for ccc_geneprodindex for TAIR.
- old_tables - this is a file that lists the names of the tables used for the previous version of the CCC curation forms.
- out - this looks like another test file for ccc_geneprodindex for TAIR.
- populate_ccc_pg_indices.pg.tair1 and populate_ccc_pg_indices.pg.worm1 - these look like the output files for the populate_ccc_pg_indices.pl for TAIR and WB. Note that for TAIR there are some sentences that were not processed properly.
- populate_ccc_pg_indices.pl - this script populates the two tables: ccc_component index and ccc_geneprodindex
- Inputs needed:
- Textpresso source files
- Mapping file of PMID to MOD Accession for WB and TAIR only (so far). dictyBase docIDs are the same as the PMIDs.
- This is available from a textpresso-dev URL that needs to be updated with each search run.
- gpi files - see above. Need to establish where the updated files we be located (i.e., where I can put them) and also update the names of the variables in the script to be more generic. This script maps the gene product names and/or synonyms used to MOD and all UniProtKB IDs.
- This script takes the raw Textpresso output and generates a human readable version of the sentences, as well as creating mappings of the gene product names or synonyms to IDs and generating a file with paper titles and abstracts for display on the curation form.
- Inputs needed:
- source
- This directory contains directories for each MOD that has a CCC implementation.
- In each MOD's directories are the source files from Textpresso searches and the pmid_data file that maps PMIDs to MOD paper identifiers as well as paper titles and abstracts.
- test.html -
- ws234_tablemaker_info - the results of the tablemaker query for gene names and status for creating a gpi file.
Back to WormBase