Difference between revisions of "First-pass schedule, instructions, automation"

From WormBaseWiki
Jump to navigationJump to search
Line 38: Line 38:
  
 
|-
 
|-
|- Juancarlos |||| Waiting for consensus on what to put in fields for author and curator for FP.|| Raymond suggests : To have a first pass form for curators that shows author's submission for curator's approval (e.g. a tick) for that information to be sent (along with whatever information a curator puts in) for data extraction. If the first-pass curator dis-approves author's input (by not ticking), then the author's input will not be further processed or sent but it will be nevertheless stored as is in the database. For the next phase, results from automated first-pass via textpresso could be treated similarly as that of the author's. The ultimate goal is to maximize the number of fields that need no curator approval. Sources of first-pass curation should be clearly distinguished by a person ID (textpresso will be assigned one).|| Juancarlos is okay with this, but while I'm leaning towards assigning the author response to a PersonID, if this is not going to be used for evidence anywhere, the corresponding email is possibly the more accurate evidence since the receiver may pass it on to someone who is not the WBPerson that email is assigned to.  In that case Textpresso wouldn't need a person ID.  I'm still leaning toward using IDs though, I'm just not sure it reflects the right things if we ever want that in WB or something like that.|| Juancarlos also needs to know how curators want to enter data.  For any given Paper-Field, would curators want to be able to enter unlimited entries, and make then invalid to delete ?  Would you prefer the current system where there's only a single box where everyone mushes in all data ?  Would you care about the history of deleted things ?   
+
| Juancarlos |||| Waiting for consensus on what to put in fields for author and curator for FP.|| Raymond suggests : To have a first pass form for curators that shows author's submission for curator's approval (e.g. a tick) for that information to be sent (along with whatever information a curator puts in) for data extraction. If the first-pass curator dis-approves author's input (by not ticking), then the author's input will not be further processed or sent but it will be nevertheless stored as is in the database. For the next phase, results from automated first-pass via textpresso could be treated similarly as that of the author's. The ultimate goal is to maximize the number of fields that need no curator approval. Sources of first-pass curation should be clearly distinguished by a person ID (textpresso will be assigned one).|| Juancarlos is okay with this, but while I'm leaning towards assigning the author response to a PersonID, if this is not going to be used for evidence anywhere, the corresponding email is possibly the more accurate evidence since the receiver may pass it on to someone who is not the WBPerson that email is assigned to.  In that case Textpresso wouldn't need a person ID.  I'm still leaning toward using IDs though, I'm just not sure it reflects the right things if we ever want that in WB or something like that.|| Juancarlos also needs to know how curators want to enter data.  For any given Paper-Field, would curators want to be able to enter unlimited entries, and make then invalid to delete ?  Would you prefer the current system where there's only a single box where everyone mushes in all data ?  Would you care about the history of deleted things ?   
  
 
|-
 
|-

Revision as of 22:50, 13 February 2009

First-Pass Rotation

First-Pass Curator Two-week period Task Goal
Karen 2/16-3/1 Organize FP curation for others with Andrei's help, set up immediate and long-term goals, reorganize FP form. Ease curation for others
Ranjana 3/2-3/15
Raymond 3/16-3/29
Jolene 3/30-4/12
Gary 4/13-4/26
Xiadong 4/27-5/10
Erich 5/11-5/24
Wen 5/25- 6/7
Kimberly 6/8-6/21

Assigned Tasks

Who When Task Goal
Andrei Analyze author fill-out form To provide a summary of what works and what not and to seek improvements of the procedure.
Andrei Work out the correspondence of fields between the author and curator forms.
Juancarlos Waiting for consensus on what to put in fields for author and curator for FP. Raymond suggests : To have a first pass form for curators that shows author's submission for curator's approval (e.g. a tick) for that information to be sent (along with whatever information a curator puts in) for data extraction. If the first-pass curator dis-approves author's input (by not ticking), then the author's input will not be further processed or sent but it will be nevertheless stored as is in the database. For the next phase, results from automated first-pass via textpresso could be treated similarly as that of the author's. The ultimate goal is to maximize the number of fields that need no curator approval. Sources of first-pass curation should be clearly distinguished by a person ID (textpresso will be assigned one). Juancarlos is okay with this, but while I'm leaning towards assigning the author response to a PersonID, if this is not going to be used for evidence anywhere, the corresponding email is possibly the more accurate evidence since the receiver may pass it on to someone who is not the WBPerson that email is assigned to. In that case Textpresso wouldn't need a person ID. I'm still leaning toward using IDs though, I'm just not sure it reflects the right things if we ever want that in WB or something like that. Juancarlos also needs to know how curators want to enter data. For any given Paper-Field, would curators want to be able to enter unlimited entries, and make then invalid to delete ? Would you prefer the current system where there's only a single box where everyone mushes in all data ? Would you care about the history of deleted things ?
Juancarlos done resolve dupicates There are many papers on the firstpass list that are already firstpassed. Most of these papers are duplicates and have two WBPaperID assignments. Is there a way to resolve this?.

Unassigned tasks and comments that need more discussion

  • We do have a record of data type curation for each paper, is there some way of combining the first pass curation with the curation status form?

-The curation status form gets flagged data from the first pass form (don't think I understand the question) -- Juancarlos

  • How does the false positive work?

- the words ``false positive get appended (or prepended, I forget and can't find an example) into the text field -- Juancarlos

  • Curator preference for FP remarks, can people deal with the not getting detailed notes about their data type and where it exists in the paper or should this be a mandatory part of first-pass?
  • Discrepancy between FP papers and total papers curated. For some data types, curators get to the paper before the FP curator, it would be good to know that a curator already touched it. Is there a way to mark the paper in the FP list as curated for a specific data type but not others?

- I thought on the checkout section they showed as ``RNAi only or something like that. -- Juancarlos

Papers for Firstpass are found on the WBPaperEditor page:

Pick a paper and access the curation form


Alternatively

  • Access the curation.cgi from the WBPaperID page itself
  • Select the WBPaperID from left column to take you to the paper page-- ONLY SELECT PAPERS FROM WBPaper0030000 AND LATER
  • Select first-pass curate

Note: the paper pdf can be accessed from the paper page along with supplemental materials.

Either action takes you to the curation.cgi (SEE BELOW)

The firstpass page curation.cgi

  • At the top of the page are links to:
    • The tazendra site map (other forms)
    • Documentation for the set up, paths from, and changes to the curation.cgi form
    • Guidelines for the form which include some summary information about the fields and features, written by Raymond in 2001 and needs some updating.
  • For each data type you can check the box or enter text:
    • Check the box = a '"yes" is entered in to the field
    • Enter text = the text is recorded
  • E-mail is set for default send if there is a "yes" (from a check) or text in the data type flag box.
  • When done, you can see the preview of the submission, by selecting "Preview!"
  • If acceptable, Select "New Entry!"
  • First pass flags will be sent.

Adding new gene paper connections

add info

For what and to whom a flag is sent

"flagged" = WBPapers that have been flagged for that particular data type but not curated yet or not assessed for curation status yet; "flagged-done" = WBPapers that have been flagged and curated. These papers can be used as a source for verified curation flag examples.

Data type Description Examples Curator Flagged
Gene Symbol (main/other/sequence) (flagged-done): New symbol for known locus or new locus defined. Alerts curators to newly cloned genes so that they can update a previously existing concise description, if needed. "UNC-80 = F25C8.3"; "We refer to the long F59F5.2/.8 transcript as glo-3 and the F59F5.2 transcript as glo-3 short (Figure 5A)."; "tbc-1=F20D1.2 table S3"; "nre-1 (uncloned)"; "we have designated K02B9.1 as meg-1 and K02B9.2 as meg-2 (meg, maternal-effect germ-cell defective)." genenames@wormbase.org, vanauken@its.caltech.edu
Mapping Data (flagged): 3-factor interval mapping-genetic only, no SNP interval mapping; mapping with Df, breakpoints, Dp "in suppl: In three-factor mapping with unc-11(e47) dpy-5(e61) the...defect mapped between the two genes (9/16 Dpy non-Unc and 4/14 Unc non-Dpy were chemotaxis defective)"; "glo- 3(kx90), glo-3(kx94), and glo-3(zu446) were complemented by nDf19...similar to nearby genes vab-3 and daf-12"; "fig.1 ...breakpoint was not mapped precisely but idDf3 completely removes sequences of fem-1, drp-1..."; "mapping data in suppl methods: We mapped sa321 by its daf-7 suppression phenotype to the unc-62 dpy-11 interval (unc-62 (4/10) sa321 (6/10) dpy-11), which excludes scd-2. Therefore, sa321 is not allelic with scd-2." genenames@wormbase.org
Gene Function (flagged-done): Discussion of new function of a gene "A Novel Role for the SMG-1 Kinase in Lifespan and Oxidative Stress Resistance in Caenorhabditis elegans"; "Here we describe the isolation of mutations in two adjacent, divergently transcribed open reading frames (eri-6 and eri-7) that fail to complement...The ERI-6/7 protein is a superfamily I helicase that both negatively regulates the exogenous RNAi pathway and functions in an endogenous RNAi pathway." emsch@its.caltech.edu
Gene Regulation on Expression Level (flagged-done): Gene expression level change in genetic background, ie, gfp expression (or any other reporter) increases or decrease; expression level change under certain chemical/physical conditions, ie, drugs, heat shock, etc.; expression location change in genetic background, missexpression, ie, nucleus, cytoplasma, etc.; no change in gene expression in genetic background if mentioned in paper. Basically, paper reports changes or lack of changes in gene expression level or pattern due to genetic background, chemical, temperature or other experimental treatment. "Figure 5. Loss of hcf-1 Promotes the DAF-16 Transcriptional Regulation of Several Target Genes"; "To investigate whether SMG-1 controls DAF-16 sub-cellular localization...DAF-16::GFP was localized in both the cytoplasm and the nucleus in all tissues of worms after smg-1 inactivation...(Figure 3)...daf-2 RNAi induced DAF-16::GFP nuclear accumulation (Figure 3)...SMG-1 does not regulate DAF-16 activity through its sequestration into the cytoplasm"; "Interestingly, the eri-6/7 transsplicing reporter is more highly and broadly expressed when RNAi is attenuated by inactivation of the Argonaute gene rde-1 (Supplementary Fig. 12)...transgenes are generally expressed at higher levels when RNAi is defective. Quantitative RT–PCR revealed that eri-6/7 mRNA levels are not increased in animals lacking RDE-1 (data not shown)." xdwang@its.caltech.edu
Expression Data (flagged-done): Temporal and spatial (e.g. tissue, subcellular, etc) distribution of genes in a wild-type background. Include: Reporter gene analysis, antibody staining, In situ hybridization, RT_PCR, Western, Northern. Exclude: all markers (antibodies or reporter genes) used to label certain tissues or subcellular structure. This flag also alerts curators for cross-checking CCC semi-automation for missed papers. "Figure 2. HCF-1 Is a Ubiquitously Expressed Nuclear Protein"; "eri-7 promoter::gfp and eri-6 promoter::rfp transgenes are expressed in overlapping patterns, with co-expression in hypodermal cells and two pairs of sensory head neurons (ASK and ASI; Fig. 2c). The eri-7 promoter also expresses in the somatic gonad." wchen@its.caltech.edu, vanauken@its.caltech.edu
Marker : Reporters or antibodies used to mark certain tissue, subcellular structure or life stage. wchen@its.caltech.edu, vanauken@its.caltech.edu
Microarray (flagged-done): Any microarray data "yes" wchen@its.caltech.edu
RNAi (flagged-done): Phenotypes/results are discussed for less than 100 RNAi experiments "yes"; "Supplementary Table 3: lsy-12 phenotypic data" garys@its.caltech.edu
Large-Scale RNAi (flagged-done): Phenotypes/results are discussed for more than 100 RNAi experiments "yes" raymond@its.caltech.edu
Transgene (flagged-done): Reagent: Integrated transgene only. Textpresso will extract all the Is or In transgene names, please look for transgenes without official names, for example, authors said they used microbombardment but did not provide official names. Or if they provided a strain name and you know the strain contains a transgene. Ex transgenes will only be curated when they are used by other experiments, such as phenotype assay or Expr_pattern. So there is no need to flag them. "Textpresso : WBPaper00032232 muIs71 muIs84 rwIs3 xrIs87 -- 20081012 ..."; "table 1" wchen@its.caltech.edu
Overexpression (flagged): Over expression of something results in a phenotype "High expression of the constitutively activated SCD-2(neu*) receptor caused 100% of animals to arrest in the first larval stage (L1), with no overt anatomical defects"; "Fig. 6. CRIP overexpression also affects epithelial shape" emsch@its.caltech.edu, garys@its.caltech.edu (add Variation phenotype people)
Structure Information (flagged): NMR structure, functional domain info for a protein "removal of the first 50aa causes mislocalization of the protein"; "fig.1, fig.8"; "yes, in paper and supplemental materials"; "FIGURE 5. The let-7 sequence is required for formation of the M1 and M2 complexes."; "functional domains of gld-2, fig 3"; "Fig. 3. Comparison of RDE-4 binding properties to variants lacking, or with mutations in, dsRBM1."
Functional Complementation (flagged): Functional redundancy, rescue by overexpression of extragenic sequence "exc-2 is rescued by exc-9 overexpression"; "...meg-2 functions redundantly with meg-1. Each transgene can rescue the sterility of meg-1(vr10) or meg-1(vr11)... The partial rescue of meg-1 mutants by GFPTMEG-2 suggests that extra copies of MEG-2 can compensate for the absence of MEG-1, implying that the overall level of MEG-1 and MEG-2 is important for proper germline development."; "In transformation experiments, we identified a cosmid, C26G6, which could rescue the egl-32 phenotype. Subclones of this cosmid containing the predicted open reading frame (ORF) T08G11.2 can also rescue egl-32 (Figures 3a & b). Several lines of evidence, however, suggest that egl-32 does not encode T08G11.2, but rather that they are interacting loci"
in vitro Protein Analysis (flagged): e.g. kinase assay "Fig. 3. Agonist pharmacology of the L-AChR"; "yes"; "fig.1", "Fig. 4. In vitro reconstitution of siRNA production using C. elegans extracts and recombinant RDE-4 variants"
Mosaic Analysis (flagged-done): e.g. extra-chromosomal transgene loss in a particular cell lineage abolishes mutant rescue "p.580"; "yes"; "fig2"; "daf-16 mosaic" raymond@its.caltech.edu
Site of Action (flagged-done): e.g. anatomy(tissue/cell)-specific expression rescues mutant phenotype; RNAi in rrf-1 background determines that the gene acts in the germ line "By contrast, a gcy-28.d cDNA rescued chemotaxis when expressed under a panneuronal promoter (Figure 5C), under the AWC-selective odr-3 promoter (Figure 5D), or under the AWCON-selective str-2 promoter (Figure 5E). The results with gcy-28.c suggest that gcy-28 may have roles in multiple neurons; nevertheless, the full rescue with gcy-28.d suggests that gcy-28 can function cell-autonomously in AWCON."; "Whereas mutant smn-1(ok355) animals expressing muscle-directed smn-1 showed only weak phenotypic rescue in a subset of animals, pan-neuronal expression of smn-1 produced stronger rescue effects (Fig. 7A)." raymond@its.caltech.edu
Extract Antibody (flagged-done): New or used antibodies created by labs; skip antibodies bought from companies "in methods"; wchen@its.caltech.edu
Covalent Modification (flagged): phosphorylation site is studies via mutagenesis and in vitro assay "Analysis of cDNA sequences derived from the eri-7 pre-mRNA stabilized in rnp-5 RNAi-treated animals revealed adenosine to guanosine transitions at four positions located within the direct repeat (Fig. 2d). These transitions are indicative of adenosine to inosine editing of the eri-7 59-UTR by an adenosine deaminase (ADAR)24"
Extract Allele : Automated How does this work? Where is this information stored? Who does it go to
Mutant Phenotype (flagged-done): Phenotype is reported for a variation. "Figure 1. hcf-1 Modulates Lifespan...Figure S2. The hcf-1(ok559) Mutant Shows Reduced Brood Size and Increased Embryonic Lethality"; "...smg- 1(r861) null mutants are associated with a fully penetrant protruding vulva phenotype (94%; n= 205)"; "Figure 4 ERI-6/7 is required for endogenous RNAi" emsch@its.caltech.edu, garys@its.caltech.edu, kyook@its.caltech.edu, jolenef@its.caltech.edu
Non-N2_phenotype : Phenotypes of strains/non-C. elegans kyook@its.caltech.edu
Sequence Change (flagged): Mutation was sequenced "the hcf- 1(pk924) mutant has a large deletion that should result in a frame shift leading to an early stop codon, and likely represents a null mutant (Figure S1) [35]."; "table 2: unnamed mutations"; "FIGURE 5. Predicted structure of the glo-3 gene and encoded protein. (A) The structure of the glo-3 gene and the location and phenotypic class of mutations are shown"; "lin-15 mutation p.834" genenames@wormbase.org
Gene Interactions (flagged-done): "e.g. daf-16(mu86) suppresses daf-2(e1370), daf-16(RNAi) suppresses daf-2(RNAi)"; "hcf-1, daf-16;daf-18, smg-1"; "eri-6, eri-7"; "gene interactions large-scale, table"; "Epistasis analysis in figure 2 and table 2" emsch@its.caltech.edu
Gene Product Interaction (flagged-done): protein-protein, RNA-protein, DNA-protein interactions, Y2H, etc., "Table S4. PDZ domain specificity prediction"; "Figure 5. Loss of hcf-1 Promotes the DAF-16 transcriptional Regulation of Several Target Genes...Figure 6. HCF-1 Forms a Protein Complex with DAF-16 in C. elegans..."; "Interestingly, the eri-6/7 transsplicing reporter is more highly and broadly expressed when RNAi is attenuated by inactivation of the Argonaute gene rde-1 (Supplementary Fig. 12), suggesting that the eri-6/7 is a target of RNAi..."; "Fig. 1. Identification of genes that regulate clec-85::gfp expression" emsch@its.caltech.edu
Sanger Gene Structure Correction - need to make sure Sanger is responsible for the clone (flagged): Gene Structure Correction (Gene Structure is different from the one in Wormbase: e.g. different splice-site, SL1 instead of SL2, etc.) "...JA:F59F5.2 targets two predicted genes, F59F5.2 and F59F5.8. Two previously isolated cDNAs, yk1328a05 and yk571h2, suggested that F59F5.2 and F59F5.8 are a single gene that is alternatively spliced to produce two transcripts. We independently isolated F59F5.2/.8 cDNAs from both adult and embryonic stages, supporting the conclusion that F59F5.2 and F59F5.8 are a single gene (see MATERIALS AND METHODS). We refer to the long F59F5.2/.8 transcript as glo-3 and the F59F5.2 transcript as glo-3 short (Figure 5A)." worm-bug@sanger.ac.uk
St. Louis Gene Structure Correction - need to make sure StLouis is responsible for the clone (flagged): Gene Structure Correction (Gene Structure is different from the one in Wormbase: e.g. different splice-site, SL1 instead of SL2, etc.) "C41D11.1 and C41D11.7 fig.1"; "We confirmed two of three forms by isolation of cDNAs (gcy-28.a and gcy-28.c, corresponding to T01A4.1a and T01A4.1c, respectively) and identified a fourth splice form, T01A4.1d (gcy-28.d), which fuses the upstream predicted gene T01A4.2 to T01A4.1 (Figure 4A)."; "page 3: By contrast, 235 of the indels occurred in regions with no perfectly aligning Solexa read, suggesting a possible error in the C. elegans reference genome," wormticket@watson.wustl.edu
Sequence Features (flagged): DNA/RNA elements required for gene expression: promoters, introns, UTR's etc. "Figure 5. Notch/glp-1 3 UTR elements repress reporter mRNA translation in arrested female gonads."; "Deletion analysis of the lag-2 promoter during dauer indicated that a small fragment located 1.3 kb upstream of the lag-2 translational start site was sufficient for dauer-specific expression in IL2 neurons (Fig. 2A)... We refer to these different classes of binding sites as A and B (Fig. 2B)." emsch@its.caltech.edu, worm-bug@sanger.ac.uk, stlouis@wormbase.org
Mass Spec (flagged-done): key words: LCMS, COSY, NMR, mass spec, HRMS gw3@sanger.ac.uk, worm-bug@sanger.ac.uk
RETIRE? Cell Name (flagged-done): New cell name is mentioned raymond@its.caltech.edu
Cell/Anatomy Function (flagged-done): Function of any anatomical part (e.g. cell) is mentioned that has not been flagged already for mosaic analysis, site of action, or ablation data "Figure 3 and Supp table 2 have information on predicted navigation circuit." raymond@its.caltech.edu
Ablation Data (flagged-done): cell or anatomical unit was ablated using a laser or by other means (e.g. by expressing a cell-toxic protein) "Killing AWCON in wild-type animals abolished the net migration toward butanone (Figure 3A; Wes and Bargmann, 2001" raymond@its.caltech.edu
Extract New SNP (flagged): Reagent: new SNP not already in WB "page 3: By contrast, 235 of the indels occurred in regions with no perfectly aligning Solexa read, suggesting a possible error in the C. elegans reference genome," dblasiar@watson.wustl.edu, tbieri@watson.wustl.edu
Extract SNP Verified by St. Louis (flagged): Authors report reference genome is incorrect "fig.2: two SNPs mentioned"; any SNP mentioned; "For nekl-1, cDNA clones indicated that the predicted gene annotation was incorrect." dblasiar@watson.wustl.edu, tbieri@watson.wustl.edu
Supplemental Material (flagged-done): If supplemental material was not already downloaded. yes qwang@its.caltech.edu
Chemicals (flagged): used in assays, what about mutagens? paraquat;butanone, benzaldehyde; aldicarb
Human Diseases (flagged): Genes discussed are homolog/ortholog of a human disease gene "The INhibitor of Growth (ING) family of type II tumor suppressors are encoded by five genes in mammals and three genes in C. elegans...Here we identify and characterize ing-3, the C. elegans gene with the highest sequence identity to the human ING3 gene."; "Deletion of smn-1, the Caenorhabditis elegans orthologue of the spinal muscular atrophy gene, results in locomotor dysfunction and reduced lifespan"; "alpha-synuclein"; "In this paper, we report on the identification of a HAP1 Caenorhabditis elegans homolog called T27A3.1. T27A3.1 shows conservation with rat and human HAP1, as well as with Milton, a Drosophila HAP1 homolog."; "VAP proteins (human VAPB/ALS8, Drosophila VAP33, and C. elegans VPR-1) are homologous proteins with an amino-terminal major sperm protein (MSP)... A point mutation (P56S) in the MSP domain of human VAPB is associated with Amyotrophic lateral sclerosis (ALS)" ranjana@its.caltech.edu
Comment (flagged): e.g. no curatable;review;the paper is used for functional annotations stays in Postgres


Working with Textpresso

Order of working with Textpresso group to automate data type flagging Still need to work out with Textpresso group

  • Gene Regulation
  • Expression
  • RNAi via pattern recognition
  • Antibody via pattern recognition


kjy 19:43, 9 February 2009 (EST)