Pictures

From WormBaseWiki
Jump to navigationJump to search

links to relevant pages
Caltech documentation
Pictures


Picture Data Model

////////////////////////////////////////////////////////////////////////////////////

?Picture      Description ?Text
              Source ?Text
              Image_lineage Crop_picture ?Picture XREF Cropped_from
                            Cropped_from ?Picture XREF Crop_picture
              Pick_me_to_call Text Text
              Expr_pattern ?Expr_pattern XREF Picture
              Remark ?Text #Evidence
              Cellular_component      ?GO_term      XREF Pictures  
              Anatomy_term ?Anatomy_term XREF Picture 
              Acknowledgments Template Text
                              Journal_name Text
                              Publication_year Text
                              Article_URL ?Database ?Database_field ?Accession_number
                              Publisher_URL ?Database ?Database_field ?Accession_number
                              Person_name Text
              Reference ?Paper XREF Picture
            
///////////////////////////////////////////////////////////////////////////////////

Picture Curation

The immediate goal of picture curation is to be able to obtain images of gene expression data from the literature and individual laboratories and display them in the WormBase gene expression page.

  • We want display images related to the temporal or spatial (e.g., tissue, subcellular, etc.) localization of any gene in a wild-type background with different data types
    • Reporter gene analysis
    • Antibody staining
    • In situ hybridization
    • RT-PCR
    • Western or Northern blot data

Pipeline

In the early phases of curation, pictures will be taken from open access journals (e.g. PLoS, BMC, Biomed Central LTD). During the process of open access image curation, other publishers will be contacted for obtaining copyright permissions.

The images should be saved and stored according to the following guidelines. The example shown below refers to a PLoS Biology paper but the rules of handling the pictures are universal and not "paper specific".


Overview

This is a mock page of the expression page for gene K07C11.4. We would like to see highlighted panel B and F with the figure capture describing the expression of the gene AND be able to access the original figure by clicking the "See original figure" button.

PictureH.png

Downloading and saving the images

Pictures are downloaded in TIFF format from the original paper.

PictureA.png


Pictures are saved with their original name in order to minimize editing from the curator. In this case the file is called “journal.pbio.0020352.g006”. The files are directly converted into JPEG. TIFF is not indicated as web display format. Avoid using special characters like ' * / in the file name.

The file is saved in a directory named after the WB paper ID. E.g.: WBPaper00024505, meaning that picture “journal.pbio.0020352.g006” has been downloaded from WBPaper00024505.


PictureB.png

These 2 numbers together WBPaper00024505_journal.pbio.0020352.g006 will be UNIQUE IDENTIFIERS of the object, that we call Picture object 1 (WBPicture000000001). The ID WBPicture000000001 will be the NAME of the object (?Picture) in the Picture Data Model.

The path WBPaper00024505_journal.pbio.0020352.g006 will define the SOURCE of the object in the Picture Data Model.

Now look at the picture above: In our WormBase expression pattern page we don’t want to display the whole picture because it contains information not pertinent to the expression data. We therefore need to CROP the 2 pictures depicting expression of the gene in the Wild Type. We want to have only panel B and F.

Each panel is cropped from the original picture in Photoshop and the files are saved as “journal.pbio.0020352.g006_B” “journal.pbio.0020352.g006_F” in the same directory as before: WBPaper00024505

PictureC.png


These will be respectively Picture object 2(WBPicture000000002) and Picture object 3 (WBPicture000000003).


To summarize till now:

Picture object 1: WBPicture000000001: WBPaper00024505_journal.pbio.0020352.g006

Picture object 2 WBPicture000000002: WBPaper00024505_journal.pbio.0020352.g006_B

Picture object 3: WBPicture000000003: WBPaper00024505_journal.pbio.0020352.g006_F

where WBPicture000000001 corresponds to the NAME of the object in the picture data model and WBPaper00024505_ journal.pbio.0020352.g006 corresponds to the SOURCE of the object in the Picture Data Model.

Question to web team: is it OK to keep the file names as proposed? -> Yes (Answer from TH october 6th)


At the same time, the text file associated with the entire figure WBPicture000000001, is saved with the same name as the figure -journal.pbio.0020352.g006- with a .doc extension. In this way we can make sure which figure legend goes with which picture. This .doc file is per se irrelevant for picture curation as the figure legend will be inserted in the "description" tag in the Picture Data Model.


PictureE1.png


Special case: what do I do when one single panel refers to multiple genes. E.g. In the example below, panel B displays the expression of 3 different genes. We will simply name the pictures Fig3_B1, Fig3_B2, Fig3_B3.


PictureG1.png

Let's go one step further...

Picture lineage

Picture object 1 is our PARENTAL IMAGE, we will display it only when the user will click on a “see original figure” link. Picture Objects 2 and 3 are our Daughter Images, which will be displayed on the gene expression page. See mock page below for a visual example:


PictureD1.png


We would like to keep the lineage relationship in order to know how images should be handled. In other words, we would like to know which image should be displayed in the expression pattern page and which should be displayed next to the "See original figure" link. For that purpose, in the Picture Data Model we have the "Image lineage" tag.


PictureK.png


There are cases in which parental image = daughter image. See picture below.


PictureL.png

Question to the web team: in this case is the Picture Data Model proposed sufficient to determine that this picture should be displayed as PARENTAL or DAUGHTER? Answer Yes


Picture size and format

All the pictures should be in JPEG format, if possible.

The picture size for thumbnails shown in the main gene expression page should be 200x200 pixels.

Picture size for the full view 600x600 pixels.

Picture size for the original file will be as big as needed.

NB: a note on 200x200 and 600x600 pixel size. This will not distort the pictures but just put a constraint on the maximum size of the thumbnail or the full image.

PictureM1.png


Generating 200x200px thumbnails

Thumbnails are generated using the freeware "ThumbsUp" (v4.4) a simple, drag-and-drop based utility to create thumbnails for a bunch of pictures and supports all image formats of Mac OS X and QuickTime (including PDF documents)<ref>http://www.macupdate.com/info.php/id/11898/thumbsup</ref>

Trials for automation have been done with Photoshop (automated image processor) and MacOSX (Automator -> creation of Thumbnail images). With Photoshop automator is NOT possible to save the thumbnails in the same folder. With MacOSX Automator is not possible to create thumbnails larger than 128px. ThumbsUp allows generation of 200x200 in the same folder where the original files are.

The file name for thumbnails is the same as the original picture with a _thumb suffix

Generating 600x600px full view

600x600 images are generated with photoshop (scripts -> image processor) and stored in a separate folder called nameofthejournal_600. For example PLoS_600. The architecture of the sub-folders is the same as the original. OICR should take the 600x600 full view files from here

Picture Data Model Proposal

////////////////////////////////////////////////////////////////////////////////////

?Picture      Description ?Text
              Source ?Text
              Image_lineage Crop_picture ?Picture XREF Cropped_from
                            Cropped_from ?Picture XREF Crop_picture
              Pick_me_to_call Text Text
              Expr_pattern ?Expr_pattern XREF Picture
              Remark ?Text #Evidence
              Cellular_component      ?GO_term      XREF Pictures  
              Anatomy_term ?Anatomy_term XREF Picture 
              Acknowledgments Template Text
                              Journal_name Text
                              Publication_year Text
                              Article_URL ?Database ?Database_field ?Accession_number
                              Publisher_URL ?Database ?Database_field ?Accession_number
                              Person_name Text
              Reference ?Paper XREF Picture
            
///////////////////////////////////////////////////////////////////////////////////

Picture Data Model step by step explanation

Picture Name of the picture object. E.g. WBPicture0000000001

Description Figure legend

Source For actual picture names. This is the name of the path leading to the picture file. The source includes the name of the directory where the picture comes from AND the name of the picture file. e.g. WBPaper00024505_journal.pbio.0020352.g006

Image Lineage This is the picture object lineage. Large figures will be cropped into sections when they represent different data. We want to maintain the picture lineage -> by clicking on the "see original figure button" we want to access the entire image.

Pick_me_to_call It is an XACE command to call the image. We will keep the Pick_me_to_call tag because there are over 1000 objects in the database that are using it.

Expr_pattern For linking to Expr-pattern data. This will be the Expr_pattern object that is associated with the picture.

Remark For curator notes

Cellular_component This links to the GO term e.g. if a picture depicts sub-cellular localization

Anatomy_term It will link the picture object directly to an Anatomy Object

Acknowledgments

e.g.: WormBase wishes to thank the journal Genetics for permission to reproduce figures from this article. Please note that this material may be protected by copyright. Reprinted from Genetics, 166:151-60, <http://www.genetics.org/cgi/content/full/166/1/151>. sel-7, a positive regulator of lin-12 activity, encodes a novel nuclear protein in Caenorhabditis elegans. Copyright (2004) with permission from the Genetics Society of America <http://www.genetics-gsa.org/> . In the sentence there are 4 variables: "WormBase wishes to thank the journal <Journal_name> for permission to reproduce figures from this article. Please note that this material may be protected by copyright. Reprinted from <Journal_name>, <Article_URL>. Copyright (<Publication_year>) with permission from <Publisher_URL>."


            * Acknowledgments Template Text
                           *    Journal_name Text
                           *    Publication_year Text
                           *    Article_URL ?Database ?Database_field ?Accession_number
                           *    Publisher_URL ?Database ?Database_field ?Accession_number
                           *    Person_name Text


where

Template Is the template sentence e.g. "WormBase wishes to thank the journal <Journal_name> for permission to reproduce figures from this article. Please note that this material may be protected by copyright. Reprinted from <Journal_name>, <Article_URL>. Copyright (<Publication_year>) with permission from <Publisher_URL>." The template sentence will change accordingly to what publishers need but the tags populating it will always be the ones listed below

Journal_name self explanatory

Publication_year self explanatory

Article_URL this will contain the URL pointing to the paper citation.

Publisher_URL this will contain the URL pointing to the publisher's homepage.

Person_name if the picture is given by a person/lab

Reference For the source of the picture E.g.WBPaper12345678.

Draft OA for picture curation

Name "" // this will be the picture ID -> generates automatically upon entry. We should have a "duplicate" button which generates a new ID. The object ID for the name reflects the postgres ID (pgid). Actually, the way the code is laid out, duplicate cannot assign a new pictureID, it has to duplicate the existing object ID. OK, no problem, I will do as Karen doesl I was thinking of the way that date_last_updated changes in the GO config, but even then for duplicates it duplicates the old date, sorry =( But if the picture ID will always be the postgres ID, you can change the number based on the number in the pgid field. When Karen creates a new molecule object (only other config that creates IDs automatically), she still has to change the name too. What should the IDs look like ? It should be WBPicture0000000001

Reference "" // This will record which paper this picture comes from -> ontology This is getting expr_pattern data from table obo_data_pic_exprpattern TODO change this when Expr Pattern OA is live. Make note on wiki for Expr Pattern OA

Description "" // this will be figure legend -> big text

Source "" // this will be the actual picture name -> small text

Cropped_from "" this will only be used by the cropped images to indicate its mother picture -> ontology of picture objects Single ontology. Have not done this yet since there are no real picture objects yet. What do you want to show in term info here ? What should it autocomplete on ? -- J

Expr_pattern "" // this relates to the Expr_pattern associated with the picture -> multiontology File that has Expr_pattern <-> paper association is ExprWS221.ace received from Wen October, 22 2010. In term info we'd like to see Gene, Pattern, Reference, Reporter_gene, Life_stage Anatomy_term, GO_term. Autocomplete just on Expr_pattern ID. For Anatomy_term retrieve Anatomy_term ID and name from app_anat_term Created obo_<data|name>_pic_exprpattern tables temporarily until Expr_pattern OA is live. TODO get rid of these tables once Expr_pattern OA is live. Make note on wiki for Expr Pattern OA. Created in mangolassi at /home/postgres/work/pgpopulation/exp_exprpattern/ When live on tazendra, use create_obo_pic_exprpattern.pl and populate_obo_exprpattern.pl TODO. Also, since you requested the Expr_pattern field to be an ontology field, you should make sure that it works for you and Xiaodong's data, then tell her that if she wants it to work like that in her OA, she needs to let me know and then we need to transfer her data from text to ontology or multiontology.

Remark "" // For other curator notes -> big text

Cellular_component "" // For sub-cellular localization -> multiontology of GO_Term like gop_goid. File that has Expr_pattern <-> GO_term association is ExprWS221.ace received from Wen October, 22 2010. After I copy wen's file on tazendra and tell you to parse it MAKE PAPERS TERM INFO DISPLAY EXPR_PATTERN

Anatomy_term "" // It will link the picture object directly to an Anatomy Object -> multiontology. Should work like app_anat_term File that has Expr_pattern <-> Anatomy_term association is ExprWS221.ace received from Wen October, 22 2010. File that has Anatomy_term <-> anatomy name association is http://obo.cvs.sourceforge.net/viewvc/obo/obo/ontology/anatomy/gross_anatomy/animal_gross_anatomy/worm/worm_anatomy/WBbt.obo For example, Wen's file has Anatomy_term "WBbt:0004854" and OBO file has WBbt:0004854 name: vm1 def: "Vulval muscle 1"

Need a curator field, where do you want it ? -- J

On mangolassi used /home/postgres/work/pgpopulation/pic_picture/create_tables to create postgres tables, will need to do on tazendra when live. Make sure to delete old pic_ tables that have been moved to pix_ TODO

The OA works on mangolassi, please test and let me know how it is. To make sure it works, we need to read in the existing picture objects from acedb, so scp the source file with the .ace objects you want populated to tazendra, and write somewhere on the wiki, how I should map the .ace data to these OA tables.


Ignore Acknowledgment for the moment!

Acknowledgments ""// the acknowledgment field will have more than one tag

            *  Template Text I can create a document with Publishers/standard text association 

e.g. Journal_name -> Publisher -> template text Juancarlos, what is the preferred file format? Is excel OK?

            *    Journal_name. Multiontology auto-complete from Journal OBO. It is not there yet, would it be possible to add it?
            *    Publication_year Text autocomplete from Journal OBO. It is not there yet, would it be possible to add it?
            *    Article_URL ?Database ?Database_field ?Accession_number Decide with J URL constructor
            *    Publisher_URL ?Database ?Database_field ?Accession_number Decide with J URL constructor
            *    Person_name Small Text


where

"Template" // is the template sentence e.g. "WormBase wishes to thank the journal <Journal_name> for permission to reproduce figures from this article. Please note that this material may be protected by copyright. Reprinted from <Journal_name>, <Article_URL>. Copyright (<Publication_year>) with permission from <Publisher_URL>." The template sentence will change accordingly to what publishers need but the tags populating it will always be the ones listed below "Journal_name" // we can retrieve it from the ?Paper data model (?Paper Reference Journal UNIQUE ?Text)

"Publication_year" we can retrieve it from the ?Paper data model (?Paper Reference Publication_date UNIQUE ?Text)

"Article_URL" // this will contain the URL pointing to the paper citation.

"Publisher_URL" this will contain the URL pointing to the publisher's homepage.

"Person_name" if the picture is given by a person/lab

   * text : text
   * bigtext : like longtext, but makes the text box expand when you click in it so you can see everything you've written
   * dropdown : few values
   * ontology : controlled vocabulary (tell me where they come from)
   * multiontology / multidropdown : (allows multiple values)
   * toggle : on / off, yes/no etc. 

We also need mappings from the existing data's .ace file so that each tag that you want to keep there gets mapped to the fields above -- J

To Do

Parse Expr.ace file into .obo file for term info and for paper term info. On tazendra at /home/acedb/draciti/Expr_pattern/ExprWS221.ace

Sample curation results for a parental image when parental ≠ cropped

Name "WBPicture0000000001"

Reference "WBPaper00024505"

Descritpion "(A) A portion of the promoter sequence of K07C11.4 from C. elegans (bottom) aligned with its ortholog from C. briggsae (top). Boxed regions show conserved predicted PHA-4 binding sites and Early-1 and Early-2 elements. Site-directed mutations that disrupt Early-1 and Early-2 (“E2 + E1 Mut”) are shown below their respective wild-type (“E2 + E1 WT”) sequence from K07C11.4. (B–E) Confocal images of mid-stage embryos expressing GFP under the control of the wild-type K07C11.4 promoter (B) or promoters with a mutation in Early-1 (C), Early-2 (D), or both Early-1 and Early-2 (E). Percentages are the fraction of transgenic embryos expressing GFP; the remainder of embryos do not express GFP. (F) Expression of the wild-type K07C11.4 reporter in a subset of somatic gonad cells in an L4 animal (arrowheads). (G) Mutation of the Early-1 element eliminates gonadal expression but does not strongly affect expression in other tissues, such as intestinal cells (arrows). Dashed lines indicate the outline of the developing pharynx."

Source "journal.pbio.0020352.g006"

Expr_pattern "Expr3097"

Remark "N/A"

Cellular_component "N/A"

Anatomy_term "N/A"

Acknowledgments "WormBase wishes to thank the journal Genetics for permission to reproduce figures from this article. Please note that this material may be protected by copyright. Reprinted from Genetics, 166:151-60, Chen J, Li XJ, Greenwald I. sel-7, a positive regulator of lin-12 activity, encodes a novel nuclear protein in Caenorhabditis elegans. Copyright (2004) with permission from the Genetics Society of America."

Sample curation results for a daughter image

Name "WBPicture0000000002"

Descritpion "Confocal images of mid-stage embryos expressing GFP under the control of the wild-type K07C11.4 promoter"

Source "journal.pbio.0020352.g006_B"

Cropped_from "journal.pbio.0020352.g006"

Expr_pattern "Expr3097"

Remark "N/A"

Cellular_component "GO_term"

Anatomy_term "WBbt:0003681 pharynx"

Acknowledgments "WormBase wishes to thank the journal Genetics for permission to reproduce figures from this article. Please note that this material may be protected by copyright. Reprinted from Genetics, 166:151-60, Chen J, Li XJ, Greenwald I. sel-7, a positive regulator of lin-12 activity, encodes a novel nuclear protein in Caenorhabditis elegans. Copyright (2004) with permission from the Genetics Society of America."

Reference "WBPaper00024505"

Sample curation results for a parental image when parental = cropped

Name "WBPicture0000000003"

Descritpion "Expression Pattern of rom-1::nls::gfp Expression pattern of the zhIs5[rom-1::nls::gfprom-1::] transcriptional reporter during vulval development. Images on the left (A, C, E, G, and I) show the corresponding Nomarski pictures with the arrows pointing at the Pn.p cell nuclei and the arrowhead indicating the position of the AC nucleus. (B) A mid L2 larva before vulval induction with uniform rom-1::nls::gfp expression in all the Pn.p cells. (D) An early L3 larva in which rom-1::nls::gfp expression was decreased in all VPCs except P6.p (see text for a quantification of the expression pattern). Note that the nuclei of hyp7 and the Pn.p cells that had fused to hyp7 displayed strong rom-1::nls::gfp expression (P1.p, P2.p, P3.p and P9.p in the example shown). (F) A mid to late L3 larva in which P6.p had generated four descendants. Expression of rom-1::nls::gfp occurred only in the 3° descendants of P.4.p and P8.p after they fused to hyp7. (H) An L4 larva during vulval invagination. No rom-1::nls::gfp was detectable in the 1° and 2° descendants of P5.p, P6.p, and P7.p, but the AC and the surrounding uterine cells displayed strong rom-1::nls::gfp expression. (K) A late L2 to early L3 larva following the ablation of the precursors of the somatic gonad. No up-regulation of rom-1::nls::gfp in P5.p, P6.p, or P7.p was observed. The scale bar in (K) is 10 μm."

Source "journal.pbio.0020334.g003"

Expr_pattern "Expr3457"

Remark "N/A"

Cellular_component "WBbt:0004017 Cell"

Anatomy_term "N/A"

Acknowledgments "WormBase wishes to thank the journal Genetics for permission to reproduce figures from this article. Please note that this material may be protected by copyright. Reprinted from Genetics, 166:151-60, Chen J, Li XJ, Greenwald I. sel-7, a positive regulator of lin-12 activity, encodes a novel nuclear protein in Caenorhabditis elegans. Copyright (2004) with permission from the Genetics Society of America."

Reference "WBPaper00024876"

Picture conversion

cd Desktop/WormBase/PLoS/Gene<tab>/ enterthis let me choose the directory from where I get the files

scp -r <directory_name I have chosen before> acedb@tazendra.caltech.edu:draciti/pictures/ enter I copy a folder and all its files into tazendra <enter password>

Open a new terminal

ssh acedb@tazendra.caltech.edu enter login into tazendra <enter password>

cd draciti/pictures/ enter this is the directory in tazendra where I will put my files cd <directory name> enter you can type ls for a list of files present in the directory igal2 -bigy 400 enter the program converts the pictures, bigy 400 is an arbitrary size for vertical pixels

ls -al it will list all the files present

If you want to open a new command window just press command+N on keyboard

the folder for getting the converted picures is called Incoming from Tazendra. We want to bring the files back: cd Desktop/Incoming\ from\ Tazendra/ enter

scp -r acedb@tazendra.caltech.edu:draciti/pictures/WBPaper00024399 . enter


(ssh = secure shell) (scp = secure copy

scp -r = recursive secure copy for directories)

(cd = change directory) pwd - show current directory