Difference between revisions of "Detailed Documentation of Form and Scripts"

Latest revision as of 15:11, 18 February 2014

Currently, on mangolassi the ccc.cgi and other scripts and files are here:

azurebrd/public_html/cgi-bin/forms/ccc

accession
- This is a file that maps WB WBPaper IDs to PMIDs.
- For WB, this file is generated each time the search is performed.
ccc_celegans_2013only
- I believe these are old test files that can be deleted.
ccc.cgi
- This is the code for the curation form.
- See the User_Guide_for_Curators for documentation of how the form works from a users' perspective.
- See the ccc.cgi documentation for more specific details about the code.
ccc.js
- This is the ccc form javascript code.
- If we want to change the number of characters needed to begin autocomplete, we can do that here.
c_elegans.WS234.xrefs.txt and c_elegans.WS236.xrefs.txt
- These are files generated with each WB build that were used to create the WB gpi file.
generate_gpi.pl
- This is the script that will be used to manually generate a new gpi file for WB.
jquery
- This directory contains...
notes
- This file contains a short bit about mapping WBGene IDs to UniProtKB accessions for the gpi file.
scripts
- This directory contains:
  - accession file - This is a mapping file that contains the mappings between WBPaper IDs and PMIDs as well as TAIR doc IDs and PMIDs. Note that we need PMIDs to send annotations to Protein2GO.
  - create_ccc_pgcuration.pl - This perl script creates two postgres tables for a given Textpresso source file:

creates these two tables:

ccc_sentenceclassification
  ccc_mod text,
  ccc_file text,
  ccc_paper text, 
  ccc_section text,
  ccc_sentnum text, 
  ccc_sentenceclassification text,
  ccc_comment text,
  ccc_curator text, 
  ccc_timestamp text,

ccc_sentenceannotation
  ccc_mod text,
  ccc_file text,
  ccc_paper text,
  ccc_section text,
  ccc_sentnum text,
  ccc_geneproduct text,
  ccc_component text,
  ccc_goterm text,
  ccc_evidencecode text,
  ccc_with text,
  ccc_alreadycurated text,
  ccc_comment text,
  ccc_valid text,
  ccc_ptgoid text,
  ccc_curator text,
  ccc_timestamp

create_ccc_pgindices.pl - This script creates the following two tables for each Textpresso source file:
- ccc_geneprodindex - this table contains the list of gene products mentioned in the sentence mapped to a MOD ID and a UniProtKB ID.
  - ccc_mod ccc_file ccc_paper ccc_section ccc_sentnum ccc_'table' ccc_timestamp
- ccc_componentindex - this table lists, for each sentence, the cellular components that matched the Textpresso cellular component category
  - ccc_mod ccc_file ccc_paper ccc_section ccc_sentnum ccc_'table' ccc_timestamp
gpi files - these files contain gene names, synonyms, and MOD and UniProtKB identifiers. For file format specifications, see:

GO's gpi file format specification

- dictyBase_07032013.gpi Also change this file name to dicty_gpi for simplicity?
- TAIR1_gpi
- worm_gpi This file name needs to be changed from ws238_gpi, but I don't have permission to do this.
meh - this looks like a test file for ccc_geneprodindex for TAIR.
old_tables - this is a file that lists the names of the tables used for the previous version of the CCC curation forms.
out - this looks like another test file for ccc_geneprodindex for TAIR.
populate_ccc_pg_indices.pg.tair1 and populate_ccc_pg_indices.pg.worm1 - these look like the output files for the populate_ccc_pg_indices.pl for TAIR and WB. Note that for TAIR there are some sentences that were not processed properly.
populate_ccc_pg_indices.pl - this script populates the two tables: ccc_component index and ccc_geneprodindex
- Inputs needed:
  - Textpresso source files
  - Mapping file of PMID to MOD Accession for WB and TAIR only (so far). dictyBase docIDs are the same as the PMIDs.
    - This is available from a textpresso-dev URL that needs to be updated with each search run.
  - gpi files - see above. Need to establish where the updated files we be located (i.e., where I can put them) and also update the names of the variables in the script to be more generic. This script maps the gene product names and/or synonyms used to MOD and all UniProtKB IDs.
- This script takes the raw Textpresso output and generates a human readable version of the sentences, as well as creating mappings of the gene product names or synonyms to IDs and generating a file with paper titles and abstracts for display on the curation form.
source
- This directory contains directories for each MOD that has a CCC implementation.
- In each MOD's directories are the source files from Textpresso searches and the pmid_data file that maps PMIDs to MOD paper identifiers as well as paper titles and abstracts.
test.html -
ws234_tablemaker_info - the results of the tablemaker query for gene names and status for creating a gpi file.

Back to WormBase

@@ Line 2: / Line 2: @@
 azurebrd/public_html/cgi-bin/forms/ccc
+*accession
+**This is a file that maps WB WBPaper IDs to PMIDs.
+**For WB, this file is generated each time the search is performed.
+*ccc_celegans_2013only
+**I believe these are old test files that can be deleted.
+*ccc.cgi
+**This is the code for the curation form.
+**See the [[User_Guide_for_Curators]] for documentation of how the form works from a users' perspective.
+**See the [[ccc.cgi documentation]] for more specific details about the code.
+*ccc.js
+**This is the ccc form javascript code.
+**If we want to change the number of characters needed to begin autocomplete, we can do that here.
+*c_elegans.WS234.xrefs.txt and c_elegans.WS236.xrefs.txt
+**These are files generated with each WB build that were used to create the WB gpi file.
+*generate_gpi.pl
+**This is the script that will be used to manually generate a new gpi file for WB.
+*jquery
+**This directory contains...
+*notes
+**This file contains a short bit about mapping WBGene IDs to UniProtKB accessions for the gpi file.
+*scripts
+**This directory contains:
+***accession file - This is a mapping file that contains the mappings between WBPaper IDs and PMIDs as well as TAIR doc IDs and PMIDs.  Note that we need PMIDs to send annotations to Protein2GO.
+***create_ccc_pgcuration.pl - This perl script creates two postgres tables for a given Textpresso source file:
+creates these two tables:
+ ccc_sentenceclassification
+   ccc_mod text,
+   ccc_file text,
+   ccc_paper text,
+   ccc_section text,
+   ccc_sentnum text,
+   ccc_sentenceclassification text,
+   ccc_comment text,
+   ccc_curator text,
+   ccc_timestamp text,
+ ccc_sentenceannotation
+   ccc_mod text,
+   ccc_file text,
+   ccc_paper text,
+   ccc_section text,
+   ccc_sentnum text,
+   ccc_geneproduct text,
+   ccc_component text,
+   ccc_goterm text,
+   ccc_evidencecode text,
+   ccc_with text,
+   ccc_alreadycurated text,
+   ccc_comment text,
+   ccc_valid text,
+   ccc_ptgoid text,
+   ccc_curator text,
+   ccc_timestamp
+*create_ccc_pgindices.pl - This script creates the following two tables for each Textpresso source file:
+**ccc_geneprodindex - this table contains the list of gene products mentioned in the sentence mapped to a MOD ID and a UniProtKB ID.
+***ccc_mod ccc_file ccc_paper ccc_section ccc_sentnum ccc_'table' ccc_timestamp
+**ccc_componentindex - this table lists, for each sentence, the cellular components that matched the Textpresso cellular component category
+***ccc_mod ccc_file ccc_paper ccc_section ccc_sentnum ccc_'table' ccc_timestamp
+*gpi files - these files contain gene names, synonyms, and MOD and UniProtKB identifiers.  For file format specifications, see:
+ [http://wiki.geneontology.org/index.php/Final_GPAD_and_GPI_file_format GO's gpi file format specification]
+**dictyBase_07032013.gpi  '''Also change this file name to dicty_gpi for simplicity?
+**TAIR1_gpi
+**worm_gpi '''This file name needs to be changed from ws238_gpi, but I don't have permission to do this.'''
+*meh - this looks like a test file for ccc_geneprodindex for TAIR.
+*old_tables - this is a file that lists the names of the tables used for the previous version of the CCC curation forms.
+*out - this looks like another test file for ccc_geneprodindex for TAIR.
+*populate_ccc_pg_indices.pg.tair1 and populate_ccc_pg_indices.pg.worm1 - these look like the output files for the populate_ccc_pg_indices.pl for TAIR and WB.  Note that for TAIR there are some sentences that were not processed properly.
+*populate_ccc_pg_indices.pl - this script populates the two tables: ccc_component index and ccc_geneprodindex
+**Inputs needed:
+***Textpresso source files
+***Mapping file of PMID to MOD Accession for WB and TAIR only (so far).  dictyBase docIDs are the same as the PMIDs.
+****'''This is available from a textpresso-dev URL that needs to be updated with each search run.'''
+***gpi files - see above. '''Need to establish where the updated files we be located (i.e., where I can put them) and also update the names of the variables in the script to be more generic.'''  This script maps the gene product names and/or synonyms used to MOD and all UniProtKB IDs.
+**This script takes the raw Textpresso output and generates a human readable version of the sentences, as well as creating mappings of the gene product names or synonyms to IDs and generating a file with paper titles and abstracts for display on the curation form.
+*source
+**This directory contains directories for each MOD that has a CCC implementation.
+**In each MOD's directories are the source files from Textpresso searches and the pmid_data file that maps PMIDs to MOD paper identifiers as well as paper titles and abstracts.
+*test.html -
+*ws234_tablemaker_info - the results of the tablemaker query for gene names and status for creating a gpi file.
+Back to [[WormBase]]

Difference between revisions of "Detailed Documentation of Form and Scripts"

Latest revision as of 15:11, 18 February 2014

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools