Textpresso

From WormBaseWiki
Jump to navigationJump to search

Textpresso for WormBase Curation

This page will serve as a platform to coordinate the automated curation efforts of Textpresso for WormBase. First two items cover general tool and website feature requests. Then, curation pipelines for each datatype are listed. Each section has subsections responsible curator, pipeline description, postgres query, requests, false positives and false negatives.

  • Responsible Curator is the person (group of people) who is e-mailed a flag for that particular data type.
  • Pipeline Description is for a detailed description of the Textpresso pipeline for that data type. Details would include the type of search performed, the papers included or excluded, how often the search is run, etc.
  • Postgres query can be used to retrieve all data for a particular data field that has been entered into the curator first pass postgres tables. To use it, copy the query and paste it into the query field on the Postgres link.
  • Requests is for posting additions or changes to procedures in the pipeline for that particular data type.
  • False Positive Cases is for posting false positive search results with a brief description of why they think it happened.
  • False Negative Cases is the same as for false positives.

Contents

Curation Tools and Tool Feature Requests

  1. Sentence_Saver_for_Category_Seed

Textpresso Website Feature Requests

Paper Type and Textpresso Searches

  • There are several different types of papers in WormBase, not all of which are actively curated. Therefore, we can exclude some paper types from the Textpresso curation pipelines. Excluded types may include:
    • Reviews
    • Meeting and Worm Breeder's Gazette Abstracts
    • WormBook and other book chapters
    • Papers marked 'for functional annotation only' (also see no_curatable section below)
    • Comment
    • News
  • Also excluded are papers with an invalid wpa
  • Papers marked as no_curatable may also be excluded, although it should be noted that no_curatable does not always refer to a specific paper type, but rather to the information contained within the paper (i.e. some research articles may have been labeled no_curatable). No_curatable thus partly reflects what data types we were curating at the time the paper was first passed. Since no_curatable is not a deeply informative label, it'd be helpful to classify these papers better, if at all possible. Possible approaches to better classifying the no_curatable papers might include: 1) adding some new data types to the first pass curation form, if particular types of experiments or studies are frequently found in the no_curatable papers, 2) identifying the no_curatable papers that were used for functional annotation only prior to being able to mark them as such in the curation form and marking them correctly, 3) determining species discussed in the no_curatable papers, etc.

Actively curated data types

C. elegans antibody

  • Responsible data curator: Wen
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_antibody WHERE cfp_antibody IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Integrated transgenes

  • Responsible data curator: Wen
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_transgene WHERE cfp_transgene IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Small scale RNAi

  • Responsible data curator: Gary
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_rnai WHERE cfp_rnai IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


GO cellular component

  • Responsible data curator: Kimberly
  • Pipeline Description:

Weekly, new papers added to the corpus are searched using four categories: CCC_Cellular_Components, CCC_Assay_Terms, CCC_Verbs and protein (C. elegans).

Results are presented in a curation cgi: Cellular Component Curation Form.

Curation results are stored in GO curation tables in postgres and visible in the Ontology Annotator as soon as they are entered into postgres.

  • Curation Tool Features

Positive sentences are displayed next to three selection boxes.

The first box contains all proteins mentioned in a given sentence, the second the list of terms that matched the Cellular Components category and the third is for the corresponding GO annotation.

Curation scenarios

New curation


Old curation, not in relationship index

If a previously curated protein is returned, but that protein's curation is not recorded in the relationship index, then the relevant information can be added to the index by:

1) selecting the protein in column 1

2) selecting the cellular component term in column 2

3) entering the corresponding GO term in column 3

4) selecting the Already curated radio button

5) clicking on Submit!

  • Postgres query to see data:
  • Requests:
  • False Positive Cases:
  • False Negative Cases:

Newly created allele

  • Responsible data curator: Jolene/Mary Ann/Margaret
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_extvariation WHERE cfp_extvariation IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Email addresses

  • Responsible data curator: Cecilia
  • Pipeline Description:
  • Postgres query to see data:
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Content of corpus

  • Responsible data curator: ?
  • Pipeline Description:
  • Postgres query to see data:
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Alleles, transgenes, rearrangements

no longer relevant: should be removed(?)

  • Responsible data curator:
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_nocuratable WHERE cfp_nocuratable IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Gene-gene interaction

  • Responsible data curator: Andrei
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_geneint WHERE cfp_geneint IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Gene product interaction

  • Responsible data curator: Erich
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_geneprod WHERE cfp_geneprod IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Mass spectrometry

  • Responsible data curator: Gary Williams
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_massspec WHERE cfp_massspec IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Transgenes used as tissue marker

  • Responsible data curator: Wen
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_marker WHERE cfp_marker IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Gene structure correction

  • Responsible data curator: Sanger and St. Louis
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_structcorr WHERE cfp_structcorr IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Sequence mutant alleles

  • Responsible data curator: Mary Ann/ Margaret
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_seqchange WHERE cfp_seqchange IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


New SNPs

  • Responsible data curator: St. Louis
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_newsnp WHERE cfp_newsnp IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Species (C. elegans, C. elegans other than Bristol, Nematode other than C. elegans, non-nematode species)

  • Responsible data curator: Kimberly (pre-postgres paper entry); Karen (post-postgres paper entry)
  • Pipeline Description:
  • Postgres query to see data:
    • SELECT * FROM cfp_celegans WHERE cfp_celegans IS NOT NULL;
    • SELECT * FROM cfp_cnonbristol WHERE cfp_cnonbristol IS NOT NULL;
    • SELECT * FROM cfp_nematode WHERE cfp_nematode IS NOT NULL;
    • SELECT * FROM cfp_nonnematode WHERE cfp_nonnematode IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Genes studied in this paper

  • Responsible data curator:
  • Pipeline Description: Currently, at the time a paper's abstract is processed via the paper editor, a script scans the abstracts for known genes and automatically links them to the paper. For those genes not caught by the script, curators need to manually enter them through the WBPaper editor CGI or through the curation first pass form.
  • Postgres query to see data: SELECT * FROM cfp_genestudied WHERE cfp_genestudied IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:

Gene mapping data

  • Responsible data curator: Mary Ann
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_mappingdata WHERE cfp_mappingdata IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Overexpression phenotypes

  • Responsible data curator: Jolene, Erich, Gary
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_overexpr WHERE cfp_overexpr IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Tissue or cell site of action

  • Responsible data curator: Raymond
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_siteaction WHERE cfp_siteaction IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Molecular function of a gene product

  • Responsible data curator: Erich
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_genefunc WHERE cfp_genefunc IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Relevance to a human disease

  • Responsible data curator: Ranjana
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_humdis WHERE cfp_humdis IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Phenotype analysis

  • Responsible data curator: Jolene, Gary, Erich
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_newmutant WHERE cfp_newmutant IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Mosaic analysis

  • Responsible data curator: Raymond
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_mosaic WHERE cfp_mosaic IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Time of (gene) action

  • Responsible data curator: Raymond
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_timeaction WHERE cfp_timeaction IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


New expression pattern for a gene

  • Responsible data curator: Wen
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_otherexpr WHERE cfp_otherexpr IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Alteration in gene expression by genetic or other treatment

  • Responsible data curator: Xiaodong
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_genereg WHERE cfp_genereg IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Regulatory sequence features

  • Responsible data curator: Xiaodong, Sanger
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_seqfeat WHERE cfp_seqfeat IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Position frequency matrix (PFM) or position weight matrix

  • Responsible data curator: Xiaodong, Erich
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_matrices WHERE cfp_matrices IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Cell function

  • Responsible data curator: Raymond
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_cellfunc WHERE cfp_cellfunc IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Microarray

  • Responsible data curator: Wen
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_microarray WHERE cfp_microarray IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Supplemental Material

  • Responsible data curator: Daniel
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_supplemental WHERE cfp_supplemental IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


None of the aforementioned data types are in this research article ‘’(i.e. Nocuratable)’’

  • Responsible data curator: Kimberly, Karen
  • Pipeline Description:: Data is stored in ‘comments’ in Postgres and sent via e-mail to Kimberly and me when a paper is flagged from the curator: first pass form.
  • Postgres query to see data:: SELECT * FROM cfp_nocuratable WHERE cfp_nocuratable IS NOT NULL;
  • Requests:


Not Actively Curated

Chemicals

  • Responsible data curator:(Karen)
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_chemicals WHERE cfp_chemicals IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:

Functional complementation

  • Responsible data curator:
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_funccomp WHERE cfp_funccomp IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:

Protein analysis in vitro

  • Responsible data curator:
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_invitro WHERE cfp_invitro IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:

Domain analysis

  • Responsible data curator:
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_domanal WHERE cfp_domanal IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:

Covalent modification

  • Responsible data curator:
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_covalent WHERE cfp_covalent IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Structural information

  • Responsible data curator:
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_structcorr WHERE cfp_structcorr IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Phylogenetic data

  • Responsible data curator:
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_phylogenetic WHERE cfp_phylogenetic IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Other bioinformatics analysis

  • Responsible data curator:
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_othersilico WHERE cfp_othersilico IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Automation Methods

The WormBase Curation Automation via Textpresso page lists the proposed automation method and the developer in charge for each data type.

--MMuller 16:14, 5 June 2009 (EDT)