Difference between revisions of "Detailed Documentation of Form and Scripts"

From WormBaseWiki
Jump to navigationJump to search
(Created page with 'Currently, on mangolassi the ccc.cgi and other scripts and files are here: azurebrd/public_html/cgi-bin/forms/ccc')
 
 
(62 intermediate revisions by the same user not shown)
Line 2: Line 2:
  
 
azurebrd/public_html/cgi-bin/forms/ccc
 
azurebrd/public_html/cgi-bin/forms/ccc
 +
 +
*accession
 +
**This is a file that maps WB WBPaper IDs to PMIDs.
 +
**For WB, this file is generated each time the search is performed.
 +
*ccc_celegans_2013only
 +
**I believe these are old test files that can be deleted.
 +
*ccc.cgi
 +
**This is the code for the curation form.
 +
**See the [[User_Guide_for_Curators]] for documentation of how the form works from a users' perspective.
 +
**See the [[ccc.cgi documentation]] for more specific details about the code.
 +
*ccc.js
 +
**This is the ccc form javascript code.
 +
**If we want to change the number of characters needed to begin autocomplete, we can do that here.
 +
*c_elegans.WS234.xrefs.txt and c_elegans.WS236.xrefs.txt
 +
**These are files generated with each WB build that were used to create the WB gpi file.
 +
*generate_gpi.pl
 +
**This is the script that will be used to manually generate a new gpi file for WB.
 +
*jquery
 +
**This directory contains...
 +
*notes
 +
**This file contains a short bit about mapping WBGene IDs to UniProtKB accessions for the gpi file.
 +
*scripts
 +
**This directory contains:
 +
***accession file - This is a mapping file that contains the mappings between WBPaper IDs and PMIDs as well as TAIR doc IDs and PMIDs.  Note that we need PMIDs to send annotations to Protein2GO.
 +
***create_ccc_pgcuration.pl - This perl script creates two postgres tables for a given Textpresso source file:
 +
creates these two tables:
 +
ccc_sentenceclassification
 +
  ccc_mod text,
 +
  ccc_file text,
 +
  ccc_paper text,
 +
  ccc_section text,
 +
  ccc_sentnum text,
 +
  ccc_sentenceclassification text,
 +
  ccc_comment text,
 +
  ccc_curator text,
 +
  ccc_timestamp text,
 +
 +
ccc_sentenceannotation
 +
  ccc_mod text,
 +
  ccc_file text,
 +
  ccc_paper text,
 +
  ccc_section text,
 +
  ccc_sentnum text,
 +
  ccc_geneproduct text,
 +
  ccc_component text,
 +
  ccc_goterm text,
 +
  ccc_evidencecode text,
 +
  ccc_with text,
 +
  ccc_alreadycurated text,
 +
  ccc_comment text,
 +
  ccc_valid text,
 +
  ccc_ptgoid text,
 +
  ccc_curator text,
 +
  ccc_timestamp
 +
*create_ccc_pgindices.pl - This script creates the following two tables for each Textpresso source file:
 +
**ccc_geneprodindex - this table contains the list of gene products mentioned in the sentence mapped to a MOD ID and a UniProtKB ID.
 +
***ccc_mod ccc_file ccc_paper ccc_section ccc_sentnum ccc_'table' ccc_timestamp
 +
**ccc_componentindex - this table lists, for each sentence, the cellular components that matched the Textpresso cellular component category 
 +
***ccc_mod ccc_file ccc_paper ccc_section ccc_sentnum ccc_'table' ccc_timestamp
 +
*gpi files - these files contain gene names, synonyms, and MOD and UniProtKB identifiers.  For file format specifications, see:
 +
[http://wiki.geneontology.org/index.php/Final_GPAD_and_GPI_file_format GO's gpi file format specification]
 +
**dictyBase_07032013.gpi  '''Also change this file name to dicty_gpi for simplicity?
 +
**TAIR1_gpi
 +
**worm_gpi '''This file name needs to be changed from ws238_gpi, but I don't have permission to do this.'''
 +
*meh - this looks like a test file for ccc_geneprodindex for TAIR.
 +
*old_tables - this is a file that lists the names of the tables used for the previous version of the CCC curation forms.
 +
*out - this looks like another test file for ccc_geneprodindex for TAIR.
 +
*populate_ccc_pg_indices.pg.tair1 and populate_ccc_pg_indices.pg.worm1 - these look like the output files for the populate_ccc_pg_indices.pl for TAIR and WB.  Note that for TAIR there are some sentences that were not processed properly.
 +
*populate_ccc_pg_indices.pl - this script populates the two tables: ccc_component index and ccc_geneprodindex
 +
**Inputs needed:
 +
***Textpresso source files
 +
***Mapping file of PMID to MOD Accession for WB and TAIR only (so far).  dictyBase docIDs are the same as the PMIDs.
 +
****'''This is available from a textpresso-dev URL that needs to be updated with each search run.'''
 +
***gpi files - see above. '''Need to establish where the updated files we be located (i.e., where I can put them) and also update the names of the variables in the script to be more generic.'''  This script maps the gene product names and/or synonyms used to MOD and all UniProtKB IDs.
 +
**This script takes the raw Textpresso output and generates a human readable version of the sentences, as well as creating mappings of the gene product names or synonyms to IDs and generating a file with paper titles and abstracts for display on the curation form.
 +
*source
 +
**This directory contains directories for each MOD that has a CCC implementation.
 +
**In each MOD's directories are the source files from Textpresso searches and the pmid_data file that maps PMIDs to MOD paper identifiers as well as paper titles and abstracts.
 +
*test.html -
 +
*ws234_tablemaker_info - the results of the tablemaker query for gene names and status for creating a gpi file.
 +
 +
 +
 +
 +
 +
 +
 +
 +
 +
Back to [[WormBase]]

Latest revision as of 15:11, 18 February 2014

Currently, on mangolassi the ccc.cgi and other scripts and files are here:

azurebrd/public_html/cgi-bin/forms/ccc

  • accession
    • This is a file that maps WB WBPaper IDs to PMIDs.
    • For WB, this file is generated each time the search is performed.
  • ccc_celegans_2013only
    • I believe these are old test files that can be deleted.
  • ccc.cgi
  • ccc.js
    • This is the ccc form javascript code.
    • If we want to change the number of characters needed to begin autocomplete, we can do that here.
  • c_elegans.WS234.xrefs.txt and c_elegans.WS236.xrefs.txt
    • These are files generated with each WB build that were used to create the WB gpi file.
  • generate_gpi.pl
    • This is the script that will be used to manually generate a new gpi file for WB.
  • jquery
    • This directory contains...
  • notes
    • This file contains a short bit about mapping WBGene IDs to UniProtKB accessions for the gpi file.
  • scripts
    • This directory contains:
      • accession file - This is a mapping file that contains the mappings between WBPaper IDs and PMIDs as well as TAIR doc IDs and PMIDs. Note that we need PMIDs to send annotations to Protein2GO.
      • create_ccc_pgcuration.pl - This perl script creates two postgres tables for a given Textpresso source file:

creates these two tables:

ccc_sentenceclassification
  ccc_mod text,
  ccc_file text,
  ccc_paper text, 
  ccc_section text,
  ccc_sentnum text, 
  ccc_sentenceclassification text,
  ccc_comment text,
  ccc_curator text, 
  ccc_timestamp text,
ccc_sentenceannotation
  ccc_mod text,
  ccc_file text,
  ccc_paper text,
  ccc_section text,
  ccc_sentnum text,
  ccc_geneproduct text,
  ccc_component text,
  ccc_goterm text,
  ccc_evidencecode text,
  ccc_with text,
  ccc_alreadycurated text,
  ccc_comment text,
  ccc_valid text,
  ccc_ptgoid text,
  ccc_curator text,
  ccc_timestamp
  • create_ccc_pgindices.pl - This script creates the following two tables for each Textpresso source file:
    • ccc_geneprodindex - this table contains the list of gene products mentioned in the sentence mapped to a MOD ID and a UniProtKB ID.
      • ccc_mod ccc_file ccc_paper ccc_section ccc_sentnum ccc_'table' ccc_timestamp
    • ccc_componentindex - this table lists, for each sentence, the cellular components that matched the Textpresso cellular component category
      • ccc_mod ccc_file ccc_paper ccc_section ccc_sentnum ccc_'table' ccc_timestamp
  • gpi files - these files contain gene names, synonyms, and MOD and UniProtKB identifiers. For file format specifications, see:
GO's gpi file format specification
    • dictyBase_07032013.gpi Also change this file name to dicty_gpi for simplicity?
    • TAIR1_gpi
    • worm_gpi This file name needs to be changed from ws238_gpi, but I don't have permission to do this.
  • meh - this looks like a test file for ccc_geneprodindex for TAIR.
  • old_tables - this is a file that lists the names of the tables used for the previous version of the CCC curation forms.
  • out - this looks like another test file for ccc_geneprodindex for TAIR.
  • populate_ccc_pg_indices.pg.tair1 and populate_ccc_pg_indices.pg.worm1 - these look like the output files for the populate_ccc_pg_indices.pl for TAIR and WB. Note that for TAIR there are some sentences that were not processed properly.
  • populate_ccc_pg_indices.pl - this script populates the two tables: ccc_component index and ccc_geneprodindex
    • Inputs needed:
      • Textpresso source files
      • Mapping file of PMID to MOD Accession for WB and TAIR only (so far). dictyBase docIDs are the same as the PMIDs.
        • This is available from a textpresso-dev URL that needs to be updated with each search run.
      • gpi files - see above. Need to establish where the updated files we be located (i.e., where I can put them) and also update the names of the variables in the script to be more generic. This script maps the gene product names and/or synonyms used to MOD and all UniProtKB IDs.
    • This script takes the raw Textpresso output and generates a human readable version of the sentences, as well as creating mappings of the gene product names or synonyms to IDs and generating a file with paper titles and abstracts for display on the curation form.
  • source
    • This directory contains directories for each MOD that has a CCC implementation.
    • In each MOD's directories are the source files from Textpresso searches and the pmid_data file that maps PMIDs to MOD paper identifiers as well as paper titles and abstracts.
  • test.html -
  • ws234_tablemaker_info - the results of the tablemaker query for gene names and status for creating a gpi file.





Back to WormBase