Textpresso

From WormBaseWiki
Revision as of 17:09, 9 June 2009 by Vanaukenk (talk | contribs)
Jump to navigationJump to search

Textpresso for WormBase Curation

This page will serve as a platform to coordinate the automated curation efforts of Textpresso for WormBase. First two items cover general tool and website feature requests. Then, curation pipelines for each datatype are listed. Each section has subsections responsible curator, pipeline description, postgres query, requests, false positives and false negatives.

  • Responsible Curator is the person (group of people) who is e-mailed a flag for that particular data type.
  • Pipeline Description is for a detailed description of the Textpresso pipeline for that data type. Details would include the type of search performed, the papers included or excluded, how often the search is run, etc.
  • Postgres query can be used to retrieve all data for a particular data field that has been entered into the curator first pass postgres tables. To use it, copy the query and paste it into the query field on the Postgres link.
  • Requests is for posting additions or changes to procedures in the pipeline for that particular data type.
  • False Positive Cases is for posting false positive search results with a brief description of why they think it happened.
  • False Negative Cases is the same as for false positives.

Contents

Curation Tools and Tool Feature Requests

Textpresso Website Feature Requests

Paper Type and Textpresso Searches

  • There are several different types of papers in WormBase, not all of which are actively curated. Therefore, we can exclude some paper types from the curation pipelines. Excluded types include:
    • Reviews
    • Meeting and Worm Breeder's Gazette Abstracts
    • WormBook chapters
    • Papers marked 'for functional annotation only'

Actively curated data types

C. elegans antibody

  • Responsible data curator: Wen
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_antibody WHERE cfp_antibody IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Integrated transgenes

  • Responsible data curator: Wen
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_transgene WHERE cfp_transgene IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Small scale RNAi

  • Responsible data curator: Gary
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_rnai WHERE cfp_rnai IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


GO cellular component

  • Responsible data curator: Kimberly
  • Pipeline Description:
  • Postgres query to see data:
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Newly created allele

  • Responsible data curator: Jolene/Mary Ann/Margaret
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_extvariation WHERE cfp_extvariation IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Email addresses

  • Responsible data curator: Cecilia
  • Pipeline Description:
  • Postgres query to see data:
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Content of corpus

  • Responsible data curator: ?
  • Pipeline Description:
  • Postgres query to see data:
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Alleles, transgenes, rearrangements

no longer relevant: should be removed(?)

  • Responsible data curator:
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_nocuratable WHERE cfp_nocuratable IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Gene-gene interaction

  • Responsible data curator: Andrei
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_geneint WHERE cfp_geneint IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Gene product interaction

  • Responsible data curator: Erich
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_geneprod WHERE cfp_geneprod IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Mass spectrometry

  • Responsible data curator: Gary Williams
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_massspec WHERE cfp_massspec IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Transgenes used as tissue marker

  • Responsible data curator: Wen
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_marker WHERE cfp_marker IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Gene structure correction

  • Responsible data curator: Sanger and St. Louis
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_structcorr WHERE cfp_structcorr IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Sequence mutant alleles

  • Responsible data curator: Mary Ann/ Margaret
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_seqchange WHERE cfp_seqchange IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


New SNPs

  • Responsible data curator: St. Louis
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_newsnp WHERE cfp_newsnp IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Species (C. elegans, C. elegans other than Bristol, Nematode other than C. elegans, non-nematode species)

  • Responsible data curator: Kimberly (pre-postgres paper entry); Karen (post-postgres paper entry)
  • Pipeline Description:
  • Postgres query to see data:
    • SELECT * FROM cfp_celegans WHERE cfp_celegans IS NOT NULL;
    • SELECT * FROM cfp_cnonbristol WHERE cfp_cnonbristol IS NOT NULL;
    • SELECT * FROM cfp_nematode WHERE cfp_nematode IS NOT NULL;
    • SELECT * FROM cfp_nonnematode WHERE cfp_nonnematode IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Genes studied in this paper

  • Responsible data curator:
  • Pipeline Description: Currently Textpresso scans the abstracts for known genes. Genes are automatically linked to the paper. For those genes not caught by Textpresso, curators need to manually enter them through the WBPaper editor CGI or through the curation first pass form.
  • Postgres query to see data: SELECT * FROM cfp_genestudied WHERE cfp_genestudied IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Gene mapping data

  • Responsible data curator: Mary Ann
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_mappingdata WHERE cfp_mappingdata IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Overexpression phenotypes

  • Responsible data curator: Jolene, Erich, Gary
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_overexpr WHERE cfp_overexpr IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Tissue or cell site of action

  • Responsible data curator: Raymond
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_siteaction WHERE cfp_siteaction IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Molecular function of a gene product

  • Responsible data curator: Erich
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_genefunc WHERE cfp_genefunc IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Relevance to a human disease

  • Responsible data curator: Ranjana
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_humdis WHERE cfp_humdis IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Phenotype analysis

  • Responsible data curator: Jolene, Gary, Erich
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_newmutant WHERE cfp_newmutant IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Mosaic analysis

  • Responsible data curator: Raymond
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_mosaic WHERE cfp_mosaic IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Time of (gene) action

  • Responsible data curator: Raymond
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_timeaction WHERE cfp_timeaction IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


New expression pattern for a gene

  • Responsible data curator: Wen
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_otherexpr WHERE cfp_otherexpr IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Alteration in gene expression by genetic or other treatment

  • Responsible data curator: Xiaodong
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_genereg WHERE cfp_genereg IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Regulatory sequence features

  • Responsible data curator: Xiaodong, Sanger
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_seqfeat WHERE cfp_seqfeat IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Position frequency matrix (PFM) or position weight matrix

  • Responsible data curator: Xiaodong, Erich
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_matrices WHERE cfp_matrices IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Cell function

  • Responsible data curator: Raymond
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_cellfunc WHERE cfp_cellfunc IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Microarray

  • Responsible data curator: Wen
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_microarray WHERE cfp_microarray IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Supplemental Material

  • Responsible data curator: Daniel
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_supplemental WHERE cfp_supplemental IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


None of the aforementioned data types are in this research article ‘’(i.e. Nocuratable)’’

  • Responsible data curator: Kimberly, Karen
  • Pipeline Description:: Data is stored in ‘comments’ in Postgres and sent via e-mail to Kimberly and me when a paper is flagged from the curator: first pass form.
  • Postgres query to see data:: SELECT * FROM cfp_nocuratable WHERE cfp_nocuratable IS NOT NULL;
  • Requests:


Not Actively Curated

Chemicals

  • Responsible data curator:(Karen)
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_chemicals WHERE cfp_chemicals IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:

Functional complementation

  • Responsible data curator:
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_funccomp WHERE cfp_funccomp IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:

Protein analysis in vitro

  • Responsible data curator:
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_invitro WHERE cfp_invitro IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:

Domain analysis

  • Responsible data curator:
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_domanal WHERE cfp_domanal IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:

Covalent modification

  • Responsible data curator:
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_covalent WHERE cfp_covalent IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Structural information

  • Responsible data curator:
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_structcorr WHERE cfp_structcorr IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Phylogenetic data

  • Responsible data curator:
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_phylogenetic WHERE cfp_phylogenetic IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


Other bioinformatics analysis

  • Responsible data curator:
  • Pipeline Description:
  • Postgres query to see data: SELECT * FROM cfp_othersilico WHERE cfp_othersilico IS NOT NULL;
  • Requests:
  • False Positive Cases:
  • False Negative Cases:


--MMuller 16:14, 5 June 2009 (EDT)