Difference between revisions of "Antibody"

From WormBaseWiki
Jump to navigationJump to search
Line 99: Line 99:
my @two = qw( antiserum antibody antibodies antisera );
my @two = qw( antiserum antibody antibodies antisera );
* 04092021: The first iteration of the new string matching pipeline will include all instances of anti-<c elegans gene name> and will include a combination of the keywords below:
my @one = qw( preparation prepared prepare production purification generation generate generated produce
produced purify purified raised );
my @two = qw( antiserum antibody antibodies antisera );
* Daniela will evaluate results
===Antibody curation controlled vocabulary===
===Antibody curation controlled vocabulary===

Latest revision as of 19:58, 9 April 2021

back to Caltech documentation

Antibody Model

?Antibody Summary UNIQUE ?Text #Evidence
          Other_name Text
          Gene ?Gene XREF Antibody #Evidence
          Isolation Original_publication UNIQUE ?Paper
                    No_original_reference Text //proposed by Wen
                    Person ?Person
                    Location ?Laboratory
                    Clonality UNIQUE Polyclonal UNIQUE Text
                                     Monoclonal UNIQUE Text
                    Antigen UNIQUE Peptide UNIQUE Text
                                   Protein UNIQUE Text
                                   Other_antigen UNIQUE Text
                    Animal UNIQUE Rabbit
                                  Other_animal UNIQUE Text
          Historical_gene ?Gene Text
          Possible_pseudonym ?Antibody XREF Possible_pseudonym_of
          Possible_pseudonym_of ?Antibody XREF Possible_pseudonym
          Antibody_for_disease ?DO_term XREF Associated_antibody #Evidence
          Expr_pattern ?Expr_pattern XREF Antibody_info
          Interactor ?Interaction
          Reference ?Paper XREF Antibody
          Remark ?Text #Evidence

modle change proposed to associate human disease to antibody (08/31/2015):

Antibody_for_disease	?DO_term  	XREF  	Associated_antibody 	#Evidence

Construct_for_disease	?DO_term	XREF	Associated_construct	#Evidence

Transgene_for_disease	?DO_term	XREF	Associated_transgene	#Evidence

Reagent_info	Associated_antibody	?Antibody	XREF      Antibody_for_disease	
		Associated_construct	?Construct	XREF	  Construct_for_disease	
   		Associated_transgene	?Transgene	XREF      Transgene_for_disease

Antibody curation SOPs

Please note, we do curate only non-commercial antibodies that target C.elegans proteins.

Finding antibody papers

-Antibody papers are flagged using text string match method (although SVM also flag antibody, I don't use this method for curation purpose).

1. go to curation status form: http://tazendra.caltech.edu/~postgres/cgi-bin/curation_status.cgi

2. choose "Curation Statistics Options Page"

3. check "antibody" and "str flags cur_strdata"

4. the number following " STR positive not validated" are the paper numbers need to be curated.

5. click on the number will give a list of papers need to be curated

New string matching pipeline

On April 2021 we have rewritten the antibody string matching pipeline. See GitHub repo: https://github.com/WormBase/entity-extraction-antibody

The string matching pipeline will run on a daily cronjob and will process 50 paper per day. On textpressocentral.org:/data/antibody_str/processed.txt there is a record of the processed papers.

  • The previous string matching pipeline written by Juancarlos was located on textpresso-dev under this directory: /home/azurebrd/work/get_wen_anti-protein

The script that was containing strings and exclusion words is get_antiprotein.pl.20111025. Below the relevant section:

my @staryes = qw( anti\-* );

my @reject = qw( Ascaris human Drosophila mammalian murine anti-HA anti-FLAG anti-Flag anti-His anti-GST 
anti-Xpress anti-V5 anti-HPC4 anti-Myc anti-myc anti-phophotyrosine anti-serotonin anti-5HT anti-5-HT ant
i-HRP anti-GABA anti-ubiquitin anti-GFP anti-actin anti-FMRFamide anti-RFamide anti-MBP anti-TMG anti-VSV
 anti-H3K4me3 );
my @starreject = qw( anti\-*\-galactosidase );

my @one = qw( preparation prepared prepare production purification generation generate generated produce 
produced purify purified raised );
my @two = qw( antiserum antibody antibodies antisera );
  • 04092021: The first iteration of the new string matching pipeline will include all instances of anti-<c elegans gene name> and will include a combination of the keywords below:

my @one = qw( preparation prepared prepare production purification generation generate generated produce produced purify purified raised ); my @two = qw( antiserum antibody antibodies antisera );

  • Daniela will evaluate results

Antibody curation controlled vocabulary

Antibody control vocabulary

Remark "Commercial Antibody." Remark "Tissue Specific Antibody Marker." Summary "Rabbit polyclonal antibody against XXX recombinant protein." Summary "Rabbit polyclonal peptide antibody against XXX." Summary "Mouse monoclonal peptide antibody against XXX."

Antibody curation guideline

WormBase requires the following information for Antibody:

1. Antibody Name: for consistance, use [WBPaperID]::anti-genename (_1, _2, etc, if several antibodies are made for same gene. genename is in CAPITALS. ig. [WBPaper00036348]::anti-EPG-2)

2. Original_publication where the antibody was first reported. For antibodies that are published for the first time, list the original publication and mark the antibody as "Original_publication" antibody (these are good and valid antibody objects in WormBase). Add the paper in reference as well.

3. targeting gene (abc-1, xyz-1 ...), clonality (polyclonal or monoclonal) and animal (rabbit or mouse ...)

4. Antigen used to generate antibody (peptide or protein sequence)

5. If the antibody is from another paper, find the original antibody object and add the reference to it.

6. If the antibody has no original reference, create a new antibody object and mark it as "No_original_reference". If you suspect the antibody is the same as another one that was previously published, enter the "Possible_pseudonym" field.

7. If it is a new antibody, it usually belongs to the lab where corresponding author is.

Alignment to Alliance data

April 2021

  • Assign WB ids to antibody object
    • Make sure that all the antibody instances used in other classes will be changed as well
Current classes connected to ?Antibody are:

Tag | buils config source db.
?DO_term XREF Associated_antibody #Evidence - citace
?Expr_pattern XREF Antibody_info - citace
?Gene XREF Antibody #Evidence - All but no Antibody data in geneace
?Interaction - citace
?Laboratory - citace
?Paper - citace
?Paper XREF Antibody - citace
?Person - citace

Probably only Expression and Interaction have to get the data changed in OA (possibly disease, double check to Ranjana)

  • move possible_pseudonym (192) and Other_animal (37) to remarks. Those tags are not currently used for curation. Use prefixes,

e.g.: Possible pseudonym: ... Other_animal:

  • Antigen field: currently separated into Protein, peptide, and other_antigen (e.g.: homogenate of early C.elegans embryos, sperm). Propose to use just one antigen field to capture antigen info.
  • request model changes to Hinxton:


Possible_pseudonym ?Antibody XREF Possible_pseudonym_of
          Possible_pseudonym_of ?Antibody XREF Possible_pseudonym
Other_animal UNIQUE Text

Add: Public_name

Antibody dumper

  • dumper is located on tazendra:
    • the module and a use_package.pl are on the tazendra at :
      • /home/postgres/work/citace_upload/antibody/get_antibody_ace.pm
      • /home/postgres/work/citace_upload/antibody/use_package.pl
    • They symlinked the use_package.pl on the tazendra at :
      • /home/acedb/xiaodong/antibody/use_package.pl
  • the dumper checks dead gene from 'Gene' field, and invalid paper from 'Original publication' and 'Reference' field, and throws results into the err.out file.
  • cronjob was cancealed for antibody dumping for upload

[is located in tazendra:( /home/acedb/wen/phenote-antibody/dump_antibody_ace.pl)


dump out file: ./dump_antibody_ace.pl

file name: antibody.ace

I usually change the file name: cp antibody.ace antibody.ace.date_of_dump

then copy file to spica: scp antibody.ace.20110503 citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Xiaodong/.

Handling Dead Genes During Dump Process

The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:

1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:

Gene  "WBGene00001234"


Gene  "WBGene00002345"  Inferred_automatically
Historical_gene  "WBGene00001234"  "Note: This object originally referred to WBGene00001234.
WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has 
replaced WBGene00001234 accordingly."

Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.

2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:

Gene  "WBGene00001234"
Historical_gene  "WBGene00001234"  "Note: This object originally referred to a gene
 (WBGene00001234) that is now considered dead. Please interpret with discretion."


Gene  "WBGene00001234"
Historical_gene  "WBGene00001234"  "Note: This object originally referred to a gene
 (WBGene00001234) that has been suppressed. Please interpret with discretion."

and lastly,

3) If the gene has undergone a split, such genes will be dumped as:

Gene  "WBGene00001234"
Historical_gene  "WBGene00001234"  "Note: This object originally referred to a gene 
(WBGene00001234) that is now considered split. Please interpret with discretion."

and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.

Gene Examples:
A split gene: WBGene00012507
A merged gene: WBGene00007524
A dead gene: WBGene00007814
A suppressed gene: WBGene00015490


Changed the postgres tables for ---05/22/2011

reference -> paper

location -> laboratory

cronjob was cancealed. dump is done manually now after checking the err.out file for each dump. - 06/04/2012

email from Juancarlose related to cronjob --- 06/06/2011

Set the cronjob to run every Thursday : 0 2 * * thu /home/acedb/xiaodong/oa_antibody_dumper/dump_antibody_ace.pl It puts the file at :


So you can see it at :


If you need to run it manually, just paste into the shell :


Then log onto spica, cd into the directory where you want it, remove the existing antibody.ace file, and do :

wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/antibody.ace"

Dumper change for historical_gene tag:-05/22/2013

-model change refer to Chris document: https://docs.google.com/a/wormbase.org/document/d/1nnuQY9OfV2VsBORj01ocoC985-5kOgxYCGOAZDeeKLQ/edit

-dumper change via Skype with J: [5/22/13 1:57:44 PM] j chan: use_package.pl -> /home/postgres/work/citace_upload/antibody/use_package.pl

[5/22/13 1:59:04 PM] j chan: http://mangolassi.caltech.edu/~postgres/cgi-bin/oa/ontology_annotator.cgi

[5/22/13 2:08:06 PM] j chan: gin_dead

[5/22/13 2:08:19 PM] j chan: Dead -> dead

[5/22/13 2:08:27 PM] j chan: merged_into WBGene -> merged

[5/22/13 2:08:31 PM] j chan: split_into -> split

[CG added 10-21-2013] cgrove: Suppressed -> suppressed

[5/22/13 2:09:06 PM] j chan: looping through the genes where somethign happenned to make sure they don't also point at something else

[5/22/13 2:09:16 PM] j chan: abp_gene

[5/22/13 2:09:47 PM] j chan: merged -> historical_gene + remark AND gene <gene> Inferred_automatically

[5/22/13 2:09:56 PM] j chan: dead -> historical_gene + remark

[5/22/13 2:10:05 PM] j chan: split -> historical_gene + remark AND error message

[CG added 10-21-2013] cgrove: Suppressed -> historical_gene + remark

[5/22/13 2:10:12 PM] j chan: normal ones -> just tag + value

-tested with genes:
A split gene: WBGene00012507
A merged gene: WBGene00007524
A dead gene: WBGene00007814
A suppressed gene: WBGene00015490

-migrated to tazendra on the same day

Dumper change for disease reagent:-09/21/2015

-antibody model change refer to above

-we want to add a new tab called 'Tab2' for antibody OA, add two new fields:

-New tables abp_humandoid abp_diseasepaper made by J 09/22/2015

-when dump, dump these two fields in one line, as: Antibody_for_disease "DOID:0060372" Paper_evidence "WBPaper00045516"

Disease with antibody reagent as 03/22/2016

WBPaper00004603 OSM-5 polycystic kidney disease ( DOID:898 ) [cgc4603]::anti-OSM-5

WBPaper00004614 DYB-1 Duchenne muscular dystrophy ( DOID:11723 ) [cgc4614]::anti-DYB-1

WBPaper00005324 HLH-8 Saethre-Chotzen syndrome ( DOID:14768 ) [WBPaper00005324]::anti-HLH-8

WBPaper00005585 PQE-1 Huntington's disease ( DOID:12858 ) [cgc5585]::anti-PQE-1_a, b, c

WBPaper00005722 TOR-2 dystonia ( DOID:543 ) [cgc5722]::anti-TOR-2

WBPaper00006152 TOR-2 dystonia ( DOID:543 ) [WBPaper00006152]::anti-TOR-2

WBPaper00024206 WRN-1 Werner syndrome ( DOID:5688 ) [WBPaper00024206]::anti-WRN-1

WBPaper00024364 ATX-2 autosomal dominant cerebellar ataxia ( DOID:1441 ) [cgc6898]::anti-ATX-2_a, [cgc6898]::anti-ATX-2_b

WBPaper00025052 ALR-1 non-syndromic X-linked intellectual disability ( DOID:0050776 ) [WBPaper00025052]::anti-ALR-1

WBPaper00025083 TOR-2 Parkinson's disease ( DOID:14330 ) [cgc5722]::anti-TOR-2

WBPaper00029200 LIS-1 lissencephaly ( DOID:0050453 ) [WBPaper00029200]::anti-LIS-1

WBPaper00029440 SPAS hereditary spastic paraplegia ( DOID:2476 ) [WBPaper00029440]::anti-SPAS-1

WBPaper00030773 SUT-1 tauopathy ( DOID:680 ) [WBPaper00030773]::anti-SUT-1

WBPaper00031344 DYC-1 Duchenne muscular dystrophy ( DOID:11723 ) [WBPaper00031344]::anti-DYC-1

WBPaper00031430 LAD-2 hereditary spastic paraplegia ( DOID:2476 ) [WBPaper00031430]::anti-LAD-2_a, [WBPaper00031430]::anti-LAD-2_b

WBPaper00032363 ELP-1 Duchenne muscular dystrophy ( DOID:11723 ) [WBPaper00032363]::anti-ELP-1

WBPaper00032910 ASPM-1 microcephaly ( DOID:10907 ) [WBPaper00032910]::anti-ASPM-1

WBPaper00032968 ALP-1 myopathy ( DOID:423 ) , cardiomyopathy ( DOID:0050700 ) [WBPaper00032968]::anti-ALP-1_a, [WBPaper00032968]::anti-ALP-1_a

WBPaper00032979 SUT-2 tauopathy ( DOID:680 ) b

WBPaper00035062 HYLS-1 hydrolethalus syndrome ( DOID:0050779 ) [WBPaper00035062]::anti-HYLS-1

WBPaper00035587 WRN-1 Werner syndrome ( DOID:5688 ) [WBPaper00035587]::anti-WRN-1_b, [WBPaper00035587]::anti-WRN-1_a

WBPaper00036018 FRG-1 facioscapulohumeral muscular dystrophy ( DOID:11727 ) [WBPaper00036018]::anti-FRG-1

WBPaper00040518 LEM-2 Emery-Dreifuss muscular dystrophy ( DOID:11726 ) [cgc4339]::anti-LEM-2_b

WBPaper00040684 WRN-1 Werner syndrome ( DOID:5688 ) [WBPaper00040684]::anti-WRN-1

WBPaper00041456 GLO-2 Hermansky-Pudlak syndrome ( DOID:3753 ) [WBPaper00041456]::anti-GLO-2.a, [WBPaper00041456]::anti-GLO-2.b

WBPaper00041513 ZYG-8 dyslexia ( DOID:4428 ) , intellectual disability ( DOID:1059 ), lissencephaly ( DOID:0050453 ) [WBPaper00041513]::anti-ZYG-8

WBPaper00042056 ZYX-1 Duchenne muscular dystrophy ( DOID:11723 ) [WBPaper00042056]::anti-ZYX-1.a,[WBPaper00042056]::anti-ZYX-1..b, [WBPaper00042056]::anti-ZYX-1.c

WBPaper00042360 DNJ-27 Alzheimer's disease ( DOID:10652 ) ,Huntington's disease ( DOID:12858 ) ,Parkinson's disease ( DOID:14330 ) ,neurodegenerative disease ( DOID:1289 ) [WBPaper00042360]::anti-DNJ-27

WBPaper00045518 HIM-6 Bloom syndrome ( DOID:2717 ) [WBPaper00045518]::anti-HIM-6

WBPaper00046032 SAS-1 orofaciodigital syndrome ( DOID:4501 ) [WBPaper00046032]::anti-SAS-1


the antibody object [cgc7055]::anti-POP-1 had 3 references attached to it. All of them were referring to the the monoclonal mouse antibody P4G4 [cgc2998]::anti-POP-1 published in Lin 1998. Daniela moved the references to the [cgc2998]::anti-POP-1 object and deleted [cgc7055]::anti-POP-1 (may 31st 2017). Checked citace to see which objects contained [cgc7055]::anti-POP-1 and there was only Expr3465. Daniela changed the antibody connection in the expression object into [cgc2998]::anti-POP-1.


On July 25th 2018 Juancarlos backed up and then deleted the data in the tfp_tables as they were not used.

The output of http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen is still populating the cur_strdata tables on the curation status form.


1. Antibody paper first pass:

  • Antibody papers are identified via a script written by Juancarlos. Here is the first pass results for antibody curation:


  • The results is in html page. It lists paper names and the antibodies associated with them.

2. A few steps need to be done before starting curation:

  • save the file on desktop as 'anti_protein_20110113.txt' (add the date in file name)
  • open a terminal
    • cd to curation, then my_acefiles, then antibody_curation
    • cp /Users/xiaodongwang/Desktop/anti_protein_20110113.txt . (copy the file into antibody_curation directory)
    • [OBSOLETE STEP: scp anti_protein_20110113.txt anti_protein.txt (copy the contents in to anti_protein.txt as input file 1 when run TextpressoABFinder)]
    • run Yuling's new script (written on 3/12/2012) to get new antibody paper:
      • ./find_new.pl anti_protein_20120320.txt WBAbPaperList.ace AbCurationLog.txt > aaa
        • basically, script will minus papers from 'WBAbPaperList.ace' and 'AbCurationLog.txt' and output new papers into file 'aaa'
  • copy and paste new papers into 'Ab_curation_spreadsheet.xlsx' of my own curation log in antibody curation folder

3. when curation is finished, copy papers from spreadsheet to 'AbCurationLog.txt' file:

Curators need to document the status of every paper from 'Ab_curation_spreadsheet.xlsx' to the curation log file AbCurationLog.txt, so that the same paper will not appear again next time.

AbCurationLog.txt is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation

4. A few important documents:

  • AbCurationLog.txt -- Curator maintain a curation log file for all the antibody papers that were already curated. it is located at: Users/xiaodongwang/curation/my_acefiles/antibody_curation
    • this file needs to be updated every time by appending newly curated paper (may copy paper list from excel file and paste into the file)
  • WBAbPaperList.ace -- Antibody papers curated before Textpresso time

OBSOLETE STEP (3/20/2012):3. The curation log file listed above only document papers that were curated after Texpresso first pass was applied. Antibodies curated before that are kept in this file: WBAbPaperList.ace

OBSOLETE STEP (3/20/2012)4. There is a script written by Wen to screen file 1, and filter out papers in file 2 and 3 (which were already curated), then give the new paper list. The script is called: TextpressoAbFinder.pl

OBSOLETE STEP (3/20/2012):[

Here is how to use the TextpressoAbFinder.pl

(wen@athena:~/TextPresso/TextPressoAb$ ./TextpressoAbFinder.pl)

/Users/xiaodongwang/curation/my_acefiles/antibody_curation/ ./TextpressoAbFinder.pl

This script check the result of Textpresso, compare with the antibody paper list dumped from citace, and look For Antibody papers that were not curated.

Input file 1: anti_protein.txt -- all antibody papers found by Textpresso

Input file 2: WBAbPaperList.ace -- Antibody papers curated before Textpresso time

Input file 3: CurationLog/AbCurationLog.txt -- Antibody curation log.

Output file 1: NewAbPaper.txt -- New antibody papers

    • I can change the output file 1 name to NewAbPaper_20110113 in script.
    • two places need to be changed each time

Output file 2: TPAbFalsePositive.txt -- All false positive antibody papers.

1789 papers flagged by Textpresso, 1734 curated, 55 need to be checked. Among not curated papers, 30 has anti-XXX pattern, 25 has no anti-XXX pattern. 1626 papers curated in citace, 1347 found by Textpresso, 279 not found by Textpresso. Recall is 0.828413284132841. 518 papers identified by Textpresso are false positive. Precision is 0.710452766908888.

5. The result of the script is NewAbPaper.txt. This is the list of antibody papers that need to be curated. I cp this txt onto my desktop and created my own excel file in desktop/curation forms/antibody curation/AB_curation_spreadsheet.xlsx, and name separate sheet in time manner

6. add curated paper in AbCurationLog.txt under /Users/xiaodongwang/curation/my_acefiles/antibody_curation, so that these paper can be subtracted from NewABPaper.txt next time. ]