Difference between revisions of "WormBase"

Revision as of 19:22, 6 August 2013

Return to Caltech documentation

1 Search frequency
2 Which papers?
3 Search according to SVM classification
4 Search Categories
5 Search Filters
6 File Format - File Name
7 Sample File Format - Sentence Format
8 Sentence File Location
9 Mapping Files Location
10 Updates to form code
11 Archived Info - Old script explanation - Juancarlos
12 Archived Info - Old Curation Form

Search frequency

Monthly, on the 1st of the month

Which papers?

Search papers brought into Textpresso in the previous month, e.g., in August, search all papers that were incorporated into Textpresso in July

Search according to SVM classification

Search all predicted positives (low, medium, high) of Other_expr pattern SVM

Search Categories

Four categories:
- localization_cell_components_2011-02-11
- protein_celegans
- localization_verbs_082208
- localization_experimental_082208

Search Filters

Remove all sentences with a Textpresso sentence score of 30 or higher.
Other filtering steps may be introduced in the future (e.g., specific proteins like DAF-16 or sentences that also contain words like mutant or RNAi).

File Format - File Name

Date Script was Run_MOD_Type of Textpresso Search

For example: 20130801_WB_ccc

Sample File Format - Sentence Format

SSC:7 PMID:23263989:references:276 ZFP-1 chromosomes 771 FIG 4 <protein_celegans>ZFP-1</protein_celegans> : : <localization_experimental_082208>GFP</localization_experimental_082208> <localization_verbs_082208>localizes</localization_verbs_082208> to <localization_cell_components_2011-02-11>chromosomes</localization_cell_components_2011-02-11> in maturing oocytes and is <localization_experimental_082208>widely</localization_experimental_082208> <localization_verbs_082208>expressed</localization_verbs_082208> 772 at all developmental <localization_experimental_082208>stages</localization_experimental_082208> .

Sentence File Location

On textpresso-dev: (The test sentences were here: http://textpresso-dev.caltech.edu/ccc_results/celegans/ccc_celegans_2013only)
On tazendra: /home/azurebrd/public_html/cgi_bin/forms/ccc/source/worm

Mapping Files Location

PMID to MOD Accession: http://textpresso-dev.caltech.edu/ccc_results/accession
- This file maps PubMed identifiers to MOD paper IDs. A script will run with ccc scripts every time to generate this universal mapping file to use.

gpi files: /home/azurebrd/public_html/cgi-bin/forms/ccc/scripts
- The gpi files map MOD protein names, to MOD and UniProtKB identifiers.
- Current WB gpi file is named ws234_gpi.

Updates to form code

In script: /home/azurebrd/public_html/cgi-bin/forms/ccc/scripts/populate_ccc_pg_indices.pl

Update name of worm gpi file
- Script currently expects: $gpi_files{'worm'} = 'ws234_gpi';
- When new gpi file is uploaded will be called ws238_gpi

Userid for sending annotations to Protein2GO needs to be updated to remove "test:"
- From my notes: will need to change line 341 in code to remove test prefix from userid for protein2go
- push @ptgoFields, "userid=test:$ptgoUser";

Archived Info - Old script explanation - Juancarlos

The script runs on Wednesdays at 2am and looks for a new SVM file. If a new SVM file is available, the Textpresso search is performed; if not, the search is skipped for that week.

The script that gets stuff to tazendra from textpresso is :

 /home/postgres/public_html/cgi-bin/data/ccc_gocuration/get_newset.pl

called by :

 /home/postgres/work/pgpopulation/textpresso/wrapper.sh

On textpresso there have been no new matches since Oct 12

The full results are on textpresso-dev at :

 /data2/srv/textpresso-dev.caltech.edu/www/docroot/azurebrd/ccc_datafiles/good_sentences_file.*

The new matches are at :

 /data2/srv/textpresso-dev.caltech.edu/www/docroot/azurebrd/ccc_datafiles/recent_sentences_file.*

They're comparing to svm results from :

 http://caprica.caltech.edu/celegans/svm_results/Juancarlos/otherexpr

You can check the recent good_sentences_file.* and see if any of those should be in SVM, or you can look at the full text of the SVM results and see if any of those should be in the good_sentences_file.* If you don't have a textpresso-dev account, you can ask Michael, and he'll ask the its people. I log on with my its account.

If stuff should be in the good_sentences_file.* and isn't, check that the categories have what they should have, and let me know which category isn't matching what it should.

The good_sentences_file.* is generated by :

 /home/azurebrd/work/get_kimberly_go_gene_component_verb_localization/get_go_gene_component.pl

Now located here?

/home/postgres/work/textpresso/kimberly/get_go_gene_component.pl

The category files being used are in :

 /data2/data-processing/data/celegans/Data/indices/body/semantic/categories

files :

 protein_celegans 
 localization_cell_components_082208 
 localization_verbs_082208 
 localization_other_120107

Archived Info - Old Curation Form

Relevant postgres tables:

ccc_gene_comp_go

Back to Gene Ontology

@@ Line 26: / Line 26: @@
 *Other filtering steps may be introduced in the future (e.g., specific proteins like DAF-16 or sentences that also contain words like mutant or RNAi).
-==Sample File Format - File Names==
+==File Format - File Name==
 *Date Script was Run_MOD_Type of Textpresso Search

Difference between revisions of "WormBase"

Revision as of 19:22, 6 August 2013

Contents

Search frequency

Which papers?

Search according to SVM classification

Search Categories

Search Filters

File Format - File Name

Sample File Format - Sentence Format

Sentence File Location

Mapping Files Location

Updates to form code

Archived Info - Old script explanation - Juancarlos

Archived Info - Old Curation Form

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools