Pictures

From WormBaseWiki
Jump to navigationJump to search

links to relevant pages
Caltech documentation
Pictures


Picture Data Model

////////////////////////////////////////////////////////////////////////////////////

?Picture      Description ?Text
              Name UNIQUE Text
              Crop Crop_picture ?Picture XREF Cropped_from
                   Cropped_from ?Picture XREF Crop_picture
              Pick_me_to_call Text Text
              Remark ?Text #Evidence
              Depict  Expr_pattern ?Expr_pattern XREF Picture
                      Anatomy ?Anatomy_term XREF Picture
                      Cellular_component ?GO_term XREF Picture               
               Acknowledgment Template UNIQUE Text
                              Publication_year UNIQUE Text
                              Article_URL UNIQUE ?Database UNIQUE ?Database_field UNIQUE ?Accession_number
                              Journal_URL UNIQUE ?Database 
                              Publisher_URL UNIQUE ?Database 
                              Person_name Text
              Reference ?Paper XREF Picture
            
///////////////////////////////////////////////////////////////////////////////////

Picture Curation

The immediate goal of picture curation is to be able to obtain images of gene expression data from the literature and individual laboratories and display them in the WormBase gene expression page.

  • We want display images related to the temporal or spatial (e.g., tissue, subcellular, etc.) localization of any gene in a wild-type background with different data types
    • Reporter gene analysis
    • Antibody staining
    • In situ hybridization
    • RT-PCR
    • Western or Northern blot data

Pipeline

In the early phases of curation, pictures will be taken from open access journals (e.g. PLoS, BMC, Biomed Central LTD). During the process of open access image curation, other publishers will be contacted for obtaining copyright permissions.

The images should be saved and stored according to the following guidelines. The example shown below refers to a PLoS Biology paper but the rules of handling the pictures are universal and not "paper specific".


Overview

This is a mock page of the expression page for gene K07C11.4. We would like to see highlighted panel B and F with the figure capture describing the expression of the gene AND be able to access the original figure by clicking the "See original figure" button.

PictureH.png

Downloading and saving the images

Pictures are downloaded in TIFF format from the original paper.

PictureA.png


Pictures are saved with their original name in order to minimize editing from the curator. In this case the file is called “journal.pbio.0020352.g006”. The files are directly converted into JPEG. TIFF is not indicated as web display format. Avoid using special characters like ' * / in the file name.

The file is saved in a directory named after the WB paper ID. E.g.: WBPaper00024505, meaning that picture “journal.pbio.0020352.g006” has been downloaded from WBPaper00024505.


PictureB.png

These 2 numbers together WBPaper00024505_journal.pbio.0020352.g006 will be UNIQUE IDENTIFIERS of the object, that we call Picture object 1 (WBPicture000000001). The ID WBPicture000000001 will be the NAME of the object (?Picture) in the Picture Data Model.

The path WBPaper00024505_journal.pbio.0020352.g006 will define the SOURCE of the object in the Picture Data Model.

Now look at the picture above: In our WormBase expression pattern page we don’t want to display the whole picture because it contains information not pertinent to the expression data. We therefore need to CROP the 2 pictures depicting expression of the gene in the Wild Type. We want to have only panel B and F.

Each panel is cropped from the original picture in Photoshop and the files are saved as “journal.pbio.0020352.g006_B” “journal.pbio.0020352.g006_F” in the same directory as before: WBPaper00024505

PictureC.png


These will be respectively Picture object 2(WBPicture000000002) and Picture object 3 (WBPicture000000003).


To summarize till now:

Picture object 1: WBPicture000000001: WBPaper00024505_journal.pbio.0020352.g006

Picture object 2 WBPicture000000002: WBPaper00024505_journal.pbio.0020352.g006_B

Picture object 3: WBPicture000000003: WBPaper00024505_journal.pbio.0020352.g006_F

where WBPicture000000001 corresponds to the NAME of the object in the picture data model and WBPaper00024505_ journal.pbio.0020352.g006 corresponds to the SOURCE of the object in the Picture Data Model.

Question to web team: is it OK to keep the file names as proposed? -> Yes (Answer from TH october 6th)


At the same time, the text file associated with the entire figure WBPicture000000001, is saved with the same name as the figure -journal.pbio.0020352.g006- with a .doc extension. In this way we can make sure which figure legend goes with which picture. This .doc file is per se irrelevant for picture curation as the figure legend will be inserted in the "description" tag in the Picture Data Model.


PictureE1.png


Special case: what do I do when one single panel refers to multiple genes. E.g. In the example below, panel B displays the expression of 3 different genes. We will simply name the pictures Fig3_B1, Fig3_B2, Fig3_B3.


PictureG1.png

Let's go one step further...

Picture lineage

Picture object 1 is our PARENTAL IMAGE, we will display it only when the user will click on a “see original figure” link. Picture Objects 2 and 3 are our Daughter Images, which will be displayed on the gene expression page. See mock page below for a visual example:


PictureD1.png


We would like to keep the lineage relationship in order to know how images should be handled. In other words, we would like to know which image should be displayed in the expression pattern page and which should be displayed next to the "See original figure" link. For that purpose, in the Picture Data Model we have the "Image lineage" tag.


PictureK.png


There are cases in which parental image = daughter image. See picture below.


PictureL.png

Question to the web team: in this case is the Picture Data Model proposed sufficient to determine that this picture should be displayed as PARENTAL or DAUGHTER? Answer Yes


Picture size and format

All the pictures should be in JPEG format, if possible.

The picture size for thumbnails shown in the main gene expression page should be 200x200 pixels.

Picture size for the full view 600x600 pixels.

Picture size for the original file will be as big as needed.

NB: a note on 200x200 and 600x600 pixel size. This will not distort the pictures but just put a constraint on the maximum size of the thumbnail or the full image.

PictureM1.png


Generating 200x200px thumbnails

Thumbnails are generated using the freeware "ThumbsUp" (v4.4) a simple, drag-and-drop based utility to create thumbnails for a bunch of pictures and supports all image formats of Mac OS X and QuickTime (including PDF documents)<ref>http://www.macupdate.com/info.php/id/11898/thumbsup</ref>

Trials for automation have been done with Photoshop (automated image processor) and MacOSX (Automator -> creation of Thumbnail images). With Photoshop automator is NOT possible to save the thumbnails in the same folder. With MacOSX Automator is not possible to create thumbnails larger than 128px. ThumbsUp allows generation of 200x200 in the same folder where the original files are.

The file name for thumbnails is the same as the original picture with a _thumb suffix

Generating 600x600px full view

600x600 images are generated with photoshop (scripts -> image processor) and stored in a separate folder called nameofthejournal_600. For example PLoS_600. The architecture of the sub-folders is the same as the original. OICR should take the 600x600 full view files from here

Picture Data Model Proposal

////////////////////////////////////////////////////////////////////////////////////

?Picture      Description ?Text
              Name UNIQUE Text
              Crop Crop_picture ?Picture XREF Cropped_from
                   Cropped_from ?Picture XREF Crop_picture
              Pick_me_to_call Text Text
              Remark ?Text #Evidence
              Depict  Expr_pattern ?Expr_pattern XREF Picture
                      Anatomy ?Anatomy_term XREF Picture
                      Cellular_component ?GO_term XREF Picture               
              Acknowledgment UNIQUE Template Text
                             Publication_year UNIQUE Text
                             Article_URL UNIQUE ?Database UNIQUE ?Database_field UNIQUE ?Accession_number
                             Journal_URL UNIQUE ?Database 
                             Publisher_URL UNIQUE ?Database 
                             Person_name Text
              Reference ?Paper XREF Picture
            
///////////////////////////////////////////////////////////////////////////////////

Picture Data Model step by step explanation

Picture Name of the picture object. E.g. WBPicture0000000001

Description Figure legend

Name For actual picture names. This is the name of the path leading to the picture file. The source includes the name of the directory where the picture comes from AND the name of the picture file. e.g. WBPaper00024505_journal.pbio.0020352.g006

Crop This is the picture object lineage. Large figures will be cropped into sections when they represent different data. We want to maintain the picture lineage -> by clicking on the "see original figure button" we want to access the entire image.

Pick_me_to_call Untouched tag from the existing model.

Remark For curator notes

Depict

Expr_pattern For linking to Expr-pattern data. This will be the Expr_pattern object that is associated with the picture.

Anatomy It will link the picture object directly to an Anatomy Object

Cellular_component This links to the GO term e.g. if a picture depicts sub-cellular localization

Acknowledgment

e.g.: WormBase wishes to thank the journal Genetics for permission to reproduce figures from this article. Please note that this material may be protected by copyright. Reprinted from Genetics, 166:151-60, <http://www.genetics.org/cgi/content/full/166/1/151>. sel-7, a positive regulator of lin-12 activity, encodes a novel nuclear protein in Caenorhabditis elegans. Copyright (2004) with permission from the Genetics Society of America <http://www.genetics-gsa.org/> .

In the sentence there are 4 variables:

"WormBase wishes to thank the journal <Journal_name> for permission to reproduce figures from this article. Please note that this material may be protected by copyright. Reprinted from <Journal_name>, <Article_URL>. Copyright (<Publication_year>) with permission from <Publisher_URL>."


            * Acknowledgment UNIQUE Template Text
                           *    Publication_year UNIQUE Text
                           *    Journal_URL UNIQUE ?Database 
                           *    Article_URL UNIQUE ?Database UNIQUE ?Database_field UNIQUE ?Accession_number
                           *    Publisher_URL UNIQUE ?Database 
                           *    Person_name Text


where

Template Is the template sentence e.g. "WormBase wishes to thank the journal <Journal_name> for permission to reproduce figures from this article. Please note that this material may be protected by copyright. Reprinted from <Journal_name>, <Article_URL>. Copyright (<Publication_year>) with permission from <Publisher_URL>." The template sentence will change accordingly to what publishers need but the tags populating it will always be the ones listed below

Publication_year self explanatory

Journal_URL this will contain the URL pointing to the journal home page.

Article_URL this will contain the URL pointing to the paper citation.

Publisher_URL this will contain the URL pointing to the publisher's homepage.

Person_name if the picture is given by a person/lab

Reference For the source of the picture E.g.WBPaper12345678.

Note: the following tags were removed from the model:

RNAi

Variation

Transgene

because there were no data associated to those tags. The search to check association was done last time on November 4th on WS219

Example

Picture : "WBPicture0000000001"

Description "Figure Legend: A. ..... B. ..... C. .... D .....""

Name "WBPicture0000000001_journal.pbio.0020352.g006_B"

Cropped_from "journal.pbio.0020352.g006"

Remark "Some remark"

Expr_pattern "Expr1234"

Anatomy "WBbt:0004017"

Cellular_component "GO:0005634"

Template "WormBase thanks the journal <Journal_URL> for permission to reproduce figures from this article. Please note that this material may be protected by copyright. Reprinted from <Journal_URL>, <Article_URL>. Copyright <Publication_year> with permission from <Publisher_URL>."

Publication_year "2004"

Article_URL WBPaper00024505_URL pbio. 0020352

Journal_URL "PLoSBiology"

Publisher_URL "PLoS"

Reference "WBPaper00024505"

Database : WBPaper00024505_URL

Name "Ding M et al. (2008) PLoS One \"The cell signaling adaptor protein EPS-8 is essential for C. elegans epidermal ....\"" 

URL_constructor "http:\/\/www.plosbiology.org\/article\/info:doi%2F10.1371%2Fjournal."

Database : PLoSBiology

Name "PLoS Biology"  

URL_constructor "http:\/\/www.plosbiology.org\/"

Database : PLoS

Name "PLoS"  

URL_constructor "http:\/\/www.plos.org\/"

Draft OA for picture curation

WBPicture "" // this will be the picture ID -> generates automatically upon entry. We should have a "duplicate" button which generates a new ID. The object ID for the name reflects the postgres ID (pgid). Actually, the way the code is laid out, duplicate cannot assign a new pictureID, it has to duplicate the existing object ID. OK, no problem, I will do as Karen does! I was thinking of the way that date_last_updated changes in the GO config, but even then for duplicates it duplicates the old date, sorry =( But if the picture ID will always be the postgres ID, you can change the number based on the number in the pgid field. When Karen creates a new molecule object (only other config that creates IDs automatically), she still has to change the name too. What should the IDs look like ? It should be WBPicture0000000001 and progressive numbers --D Ok, changed from WBPicture:12345678 to WBPicture1234567890 (no : and 10 digits) Daniela has double checked with Gary -> Is OK to remove colon, which is used mainly for ontologies -- J J can you please change the Name into WBPicture in the OA? Done - J

Reference "" // This will record which paper this picture comes from -> ontology This is getting expr_pattern data from table obo_data_pic_exprpattern TODO change this when Expr Pattern OA is live. Make note on wiki for Expr Pattern OA -- J. OK J when I start the Expr_pattern OA wiki I'll make a note TODO Daniela--D Get the Jpgs from the picture_source file in Tazendra Daniela specify path for J. For now I want to be part of the automatic term info update as opposed to manual. I put the file "picture_source" on tazendra under draciti J please add Journal_name and Publication_year. And always show those 2 fields so that it is clear for me when is missing D TODO on tazendra create obo_ tables for pic_picturesource at /home/postgres/work/pgpopulation/obo_oa_ontologies/create_obo_pic_picturesource.pl -- J Have moved picture_source file on mangolassi to /home/acedb/draciti/picture_source TODO on tazendra, incorporate to cronjob /home/acedb/draciti/picture_source/populate_obo_data_pic_picturesource.pl -- J Reference Term info now always has picture_source .jpg files listed, as well as Journal and Year (or BLANK for journal / year if not available)

Description "" // this will be figure legend -> big text

Source "" // this will be the actual picture name -> small text J, when you are in the office, I will show you a file I might use for autocomplete the source. i don't know if it is feasible but we should have a look at it cause it can save me lots of copy-pasting ^^ If it is not ok we keep the small text field D okay, but if you want a set of stuff to autocomplete, you'd have to maintain something for the database to update from it. for example, there's a lot of obo files maintained in cvs in sourceforge, so there's a script that updates based on those. if you wanted to commit your file to some cvs repository like that, it should work -- J. OK once you see the file in the office you can tell me how easy it is to maintain that file -- D Ok, sure. It's more an issue of where you're going to keep it for a script to pick it up - J Daniela todo -> scp on Tazendra a file the file Picture_source and update it constantly every time you are done with a journal J I scp in tazendra under draciti a file called picture_source. D Oh, sorry, when we're live it should be on tazendra, for now since we're working on stuff I'm putting it on mangolassi. So you wanted to run a script manually to populate this (extra step) or did you want a cronjob to pick it up and update everyday (potential for 24 hour delay before you can curate, when would you want it to run ?). To be clear, what should I do with this file ? J If I would like to have a cronjob (update every day). If I understand correctly I will modify the file once in a while (whenever I have new data to put in) and the cronjob will automatically update the tables. If that is the case, let's go for it! Automatic, yes, but not instantaneous, I just want to be clear that if the script runs at 2am every day, and you update it at noon, you won't get to see the term info updates until the next day at 2am -- J No problem D. For Reference field, only display the JPGs when entering a paper. J you mean displaying only the JPGs coming from the picture_source file and not displaying the .docx files right? if this is what you mean the answer is yes. And if I am correct, in the Reference field, I will still continue to see expression pattern data, correct? D Yes, sorry, I meant just the filename of the jpg files, as opposed to the other files, and yes, in addition to other WBPaper data -- J OK D For source it doesn't matter because it's text, not ontology ? yes D For cropped_from, ignore it because it will autocomplete from this OA Source field, not from the picture_source information we get from this file ? Is all that correct ? -- J I think so.. Let's talk about this after the meeting to make sure, then you can confirm on this wiki -- J OK D now that I am a bit more free from the modelling I can finally seriously testing the OA. And regarding this, this morning is not working, maybe because you are working on it? ^^ The error I get is JSON parse failed D Sorry about that ! I had to wipe and repopulate the database on the sandbox for some interaction stuff a few times, and I forgot to recreate the picture tables afterwards (also, sorry, any data you entered is gone) - J No problem D

Cropped_from "" this will only be used by the cropped images to indicate its mother picture -> ontology of picture objects Single ontology. Have not done this yet since there are no real picture objects yet. What do you want to show in term info here? The cropped from will be used only for duplicated objects. Let's say I have a mother picture and I want to duplicate it because I have a cropped panel. I would like to see Cropped_from "journal.pbio.0020352.g006" You actually want the WBPicture ID here, right ? -- J. I want to have the same name as "source" of the mother pictureD What should it autocomplete on? It should autocomplete on the "source" of the mother picture --D Do you want anything on the term info, just the name and ID ? autocomplete only on source, not both source and ID ? If you can autocomplete on both source and ID would be good!D Sure (do reply to the PictureID stored in postgres, I'm pretty sure that's what you want, but do confirm. OK Juancarlos, maybe I confused myself. To summarize the Cropped_from field: In the Cropped_from field I want to autocomplete with the "source" of the mother picture. In the Term info I would like to see The name and the ID (e.g. WBPicture0000000001, and journal.pbio.0020352.g006). Does that sound right to you? otherwise I'll show you on Thursday -- D It kind of makes sense, but I'm not sure it's good. Each picture object has a picture ID, so when referring to it, it's best to refer to the picture ID, because it's potentially possible that the name could change. Otherwise we'd just have picture names instead of picture IDs, right ? So I think of the source as a name, and we could have this field autocomplete on the source/name, but then store/save the picture ID. Then when dumping to .ace outputting the source of that picture object. This way if you make ID 1 -> source "blah", then ID 2 -> cropped from ID 1, then change ID 1 -> source "different", when you dump picture 2 it would say cropped from picture ID 1, with source "different". If in the same case you put in picture 2 source -> "blah", when you dumped picture 2 it would always say "blah". Does that make sense ? -- J In the Cropped_from we will autocomplete on Source (file name) then WBPicture ID; and store in postgres the picture ID -- D Right, and we'll do that based on the Source OA field / postgres table, not what was entered in the "picture_source" file above -- J Yes! D Autocomplete on source then WBPicture name/ID. Show on Term Info name/ID, source, reference. -- J

Expr_pattern "" // this relates to the Expr_pattern associated with the picture -> multiontology File that has Expr_pattern <-> paper association is ExprWS221.ace received from Wen October, 22 2010. In term info we'd like to see Gene, Pattern, Reference, Reporter_gene, Life_stage Anatomy_term, GO_term. Autocomplete just on Expr_pattern ID. For Anatomy_term retrieve Anatomy_term ID and name from app_anat_term Created obo_<data|name>_pic_exprpattern tables temporarily until Expr_pattern OA is live. TODO get rid of these tables once Expr_pattern OA is live. Make note on wiki for Expr Pattern OA. Created in mangolassi at /home/postgres/work/pgpopulation/exp_exprpattern/ When live on tazendra, TODO use /home/postgres/work/pgpopulation/exp_exprpattern/create_obo_pic_exprpattern.pl and populate_obo_exprpattern.pl DONE created on tazendra for grg_generegulation, which needed these tables. Also, since you requested the Expr_pattern field to be an ontology field, you should make sure that it works for you and Xiaodong's data, then tell her that if she wants it to work like that in her OA, she needs to let me know and then we need to transfer her data from text to ontology or multiontology. OK< Xiaodong said it is fine with her --D

Remark "" // For other curator notes -> big text Daniela think about the remark and how you would like to have the .ace file dumped -- D (?)

Cellular_component "" // For sub-cellular localization -> multiontology of GO_Term like gop_goid. File that has Expr_pattern <-> GO_term association is ExprWS221.ace received from Wen October, 22 2010. After I copy wen's file on tazendra and tell you to parse it MAKE PAPERS TERM INFO DISPLAY EXPR_PATTERN

Anatomy_term "" // It will link the picture object directly to an Anatomy Object -> multiontology. Should work like app_anat_term File that has Expr_pattern <-> Anatomy_term association is ExprWS221.ace received from Wen October, 22 2010. File that has Anatomy_term <-> anatomy name association is http://obo.cvs.sourceforge.net/viewvc/obo/obo/ontology/anatomy/gross_anatomy/animal_gross_anatomy/worm/worm_anatomy/WBbt.obo For example, Wen's file has Anatomy_term "WBbt:0004854" and OBO file has WBbt:0004854 name: vm1 def: "Vulval muscle 1"

Article_URL_Accession "" // Text this is the unique id pointing to a paper URL

Person "" person multiontology on people as in Phenotype

Person_text "" free text small. Note on the .ace file for Person and Person_name: join all person objects's starndard names with commas, then comma, then person_name text. If there's no person_name text. join all person objects's standard names with commas, except for the last one, which is joined by "<comma> and ".

Life_stage "" multiontology autocomplete on ExprWS221.ace received from Wen October, 22 2010 (same file as others) --D I don't understand what you mean about the Expr file, but you can see a life stage field in the phenotype OA, and see if that works like you'd want -J Yes, it si perfect to have the life stage field like in the phenotype OA D Ok, We have a few new tables to make, so I'll wait until we're set with those and make them all at once (curation_status, lifestage, anything else ?) -- J Give me a couple of days to test it more and to solve the Acknowledgment issue. I did not hear from the webteam yet and I feel bad coming back and forth to you with new requests! :( That's okay, we can wait for the acknowledgements, but are those the only things you want for non-acknowledgements ? Just curation_status and lifestage ? No. Have created tables urlaccession person persontext lifestage nodump chris 2010 11 15 Please test OA -- J Tested D

Curator "" field, where do you want it ? -- J Where you put it it is totally fine :) -D

No dump "" Toggle J please can you add the no dump field? Thanks D

Chris "" Toggle


Acknowledgments ""// the acknowledgment field will have more than one tag

            *    Template Text Daniela created a table with Publisher/template text association -> Mappings.txt tab delimited file
            *    Publication_year J will get data from paper tables
            *    Article_URL Daniela created a table with Journal name/URL constructor -> Mappings.txt tab delimited file
            *    Journal_URL Daniela created a table with Journal name/Journal_URL -> Mappings.txt tab delimited file
            *    Publisher_URL Daniela created a table with Publisher/Publisher URL -> Mappings.txt tab delimited file
            *    Person_name will have 2 boxes in OA


where

"Template" // is the template sentence e.g. "WormBase wishes to thank the journal <Journal_name> for permission to reproduce figures from this article. Please note that this material may be protected by copyright. Reprinted from <Journal_name>, <Article_URL>. Copyright (<Publication_year>) with permission from <Publisher_URL>." The template sentence will change accordingly to what publishers need but the tags populating it will always be the ones listed below Okay, it sounds like you don't want to store the template in the OA, you'll just tell me to hardcode in the dumper script that if it's a given Journal, use this template, if another some specific other template, and so forth ; so that if the template ever changes for a given journal we can just change the dumper script, correct ? - J Supercorrect! -D Also, you should probably look at the full list of Journal objects, because there are _tons_ (1404) due to minor differences in spelling and what-not. You probably can't map all those to a template, but I don't know. Maybe once we have a list of papers (do we already ?) we can see what journals exist for those given papers. J I have a list of 184 journals containing Expr_pattern data but I don't have yet a number for the template sentences because I am still working on getting copyright and permissions. It can be there will be 5 or 50, I just don't know yet - J Right on, got it -- J You also need to work out the issue of pubmed_final with Kimberly (see Journal_name section) Here's the postgres query for the 1404 journals SELECT DISTINCT(pap_journal) FROM pap_journal ; which you can see on the referenceform.cgi linked in the sitemap -- J OK I wrote to Kimberly, her answer is that if a paper has a PMID and is missing Journal or Year, let her know. If it doesn't have a PMID and is missing that information, I should feel free to fill it in using the paper editor.

"Journal_name" // we can retrieve it from the ?Paper data model (?Paper Reference Journal UNIQUE ?Text) Yes, but you need to tell me what to do if there's no Journal info for a given paper. I need to write something in the code of the dumper script to account for cases where there's no Journal. I see. If you're entering a paper you can see that there is or isn't a journal, but if the pubmed_final field is not set to final, then seeing a journal doesn't always mean that there will be a journal later. You should talk to Kimberly about this. - J I wrote to Kimberly and asked her how we should proceed on that. I'll let you know as soon as she gets back to me D Hopefully we can talk to her after the conference call tomorrow -- J see Kimberly answer above D J please get data from Paper tables and If the journal-name is empty write BLANK D

"Publication_year" we can retrieve it from the ?Paper data model (?Paper Reference Publication_date UNIQUE ?Text) Same as above -- J Same as above -- D J please get data from Paper tables and If the Publication_year is empty write BLANK D

"Article_URL" // this will contain the URL pointing to the paper citation. Do you store that in postgres, or can we generate this from a paper ID pointing to WormBase ? -- Daniela will create tables journal name -> Article_URL

"Journal_URL" // this will contain the URL pointing to the journal. Do you store that in postgres, or can we generate this from a paper ID pointing to WormBase ? -- Daniela will create tables journal name -> Journal_URL

"Publisher_URL" this will contain the URL pointing to the publisher's homepage. Are the mappings always the same that we can get based on the journal name ? -- J Daniela will generate tables Publisher -> publisher URL


On mangolassi used /home/postgres/work/pgpopulation/pic_picture/create_tables to create postgres tables, will need to do on tazendra when live. Make sure to delete old pic_ tables that have been moved to pix_ TODO


Reading in existing picture objects

J: To make sure it works, we need to read in the existing picture objects from acedb, so scp the source file with the .ace objects you want populated to tazendra, and write somewhere on the wiki, how I should map the .ace data to these OA tables.

D: The file with existing picture objects is in tazendra and is called citace220picture.ace

in the file you have only 2 info for a single object:

Picture : "29055F14H3.11_1.jpg" -> this should map to the "source" field in the OA

Expr_pattern "Expr7505" -> this should map to the "Expr_Pattern" field in the OA

we need to add

Picture ID for each of them WBPicture0000000001, WBPicture0000000002 and so on

Reference "WBPaper12345678"

Publication_year


Description and anatomy I will add them once I'll get to annotate them. At the moment they should also be flagged as No dump since I have to manually curate them


Ok, but everything needs an ID and a curator, right ? What should it be for those ? Anything else that should be assigned, like curation status or anything else ? -- J

the Curator I think you can get it from the time stamp. I will probably become the curator once I'll add the anatomy terms associated to the pictures.


   * text : text
   * bigtext : like longtext, but makes the text box expand when you click in it so you can see everything you've written
   * dropdown : few values
   * ontology : controlled vocabulary (tell me where they come from)
   * multiontology / multidropdown : (allows multiple values)
   * toggle : on / off, yes/no etc. 

We also need mappings from the existing data's .ace file so that each tag that you want to keep there gets mapped to the fields above -- J


To Do

Parse Expr.ace file into .obo file for term info and for paper term info. On tazendra at /home/acedb/draciti/Expr_pattern/ExprWS221.ace

We already did this for generegulation OA -- J

Final .ace file should be dumped as

Picture : WBPicture0000000001

Description "Figure Legend: A. ..... B. ..... C. .... D .....""

Name "WBPaper12345678_journal.pbio.0020352.g006_B" -- I need to have in this field the "Reference_source"

Cropped_from "journal.pbio.0020352.g006"

Remark "Some remark"

Expr_pattern "Expr1234"

Anatomy "WBbt:0005175"

Anatomy "WBbt:0003681"

Cellular_component "GO:123456"

Template "WormBase thanks the journal <Journal_URL> for permission to reproduce figures from this article. Please note that this material may be protected by copyright. Reprinted from <Journal_URL>, <Article_URL>. Copyright <Publication_year> with permission from <Publisher_URL>." -- take this from Template Text from Mappings.txt

Publication_year "2004"

Article_URL WBPaper00024505_URL id 0020352 -- Reference_URL id accession number.

Journal_URL "PLoS Biology" -- Take Full Journal Name from the Mappings.txt file on tazendra

Publisher_URL "PLoS" -- Take it from Publisher_name from Mappings.txt file

Reference "WBPaper00024505"


Database : WBPaper00024505_URL -- Reference_URL

Name "Ding M et al. (2008) PLoS One \"The cell signaling adaptor protein EPS-8 is essential for C. elegans epidermal ....\"" -- take this from Brief citation from Paper model NB There are "" that have to be escaped with backslash \ otherwise the .ace file is not reading in fine. Brief_citation name coming from new module at /home/postgres/work/citace_upload/papers/get_brief_citation.pm -- J

URL_constructor "http:\/\/www.plosbiology.org\/article\/info:doi%2F10.1371%2Fjournal.pbio.%S" -- take this from Article_URL from Mappings.txt


Database : "PLoS Biology" Take it from Full Journal Name from Mappings.txt

Name "PLoS Biology" Take it from Full Journal Name from Mappings.txt

URL_constructor "http:\/\/www.plosbiology.org\/" -- take this from Journal_URL from Mappings.txt


Database : PLoS -- Publisher_name in Mappings.txt

Name "PLoS" -- Publisher_name in Mappings.txt

URL_constructor "http:\/\/www.plos.org\/" take this from Publisher_URL from Mappings.txt



.ace dumper at mangolassi at /home/acedb/draciti/oa_picture_ace_dumper/ (actually at /home/postgres/work/citace_upload/picture/ and symlinked here)

called dump_picture_ace.pl

generates pictures.ace and pictures.err (errorfile, always look at this even if it's usually empty) -- J

Sample curation results for a parental image when parental ≠ cropped

Name "WBPicture0000000001"

Reference "WBPaper00024505"

Descritpion "(A) A portion of the promoter sequence of K07C11.4 from C. elegans (bottom) aligned with its ortholog from C. briggsae (top). Boxed regions show conserved predicted PHA-4 binding sites and Early-1 and Early-2 elements. Site-directed mutations that disrupt Early-1 and Early-2 (“E2 + E1 Mut”) are shown below their respective wild-type (“E2 + E1 WT”) sequence from K07C11.4. (B–E) Confocal images of mid-stage embryos expressing GFP under the control of the wild-type K07C11.4 promoter (B) or promoters with a mutation in Early-1 (C), Early-2 (D), or both Early-1 and Early-2 (E). Percentages are the fraction of transgenic embryos expressing GFP; the remainder of embryos do not express GFP. (F) Expression of the wild-type K07C11.4 reporter in a subset of somatic gonad cells in an L4 animal (arrowheads). (G) Mutation of the Early-1 element eliminates gonadal expression but does not strongly affect expression in other tissues, such as intestinal cells (arrows). Dashed lines indicate the outline of the developing pharynx."

Source "journal.pbio.0020352.g006"

Expr_pattern "Expr3097"

Remark "N/A"

Cellular_component "N/A"

Anatomy_term "N/A"

Acknowledgments "WormBase wishes to thank the journal Genetics for permission to reproduce figures from this article. Please note that this material may be protected by copyright. Reprinted from Genetics, 166:151-60, Chen J, Li XJ, Greenwald I. sel-7, a positive regulator of lin-12 activity, encodes a novel nuclear protein in Caenorhabditis elegans. Copyright (2004) with permission from the Genetics Society of America."

Sample curation results for a daughter image

Name "WBPicture0000000002"

Reference "WBPaper00024505"

Descritpion "Confocal images of mid-stage embryos expressing GFP under the control of the wild-type K07C11.4 promoter"

Source "journal.pbio.0020352.g006_B"

Cropped_from "journal.pbio.0020352.g006"

Expr_pattern "Expr3097"

Remark "N/A"

Cellular_component "GO_term"

Anatomy_term "WBbt:0003681 pharynx"

Acknowledgments "WormBase wishes to thank the journal Genetics for permission to reproduce figures from this article. Please note that this material may be protected by copyright. Reprinted from Genetics, 166:151-60, Chen J, Li XJ, Greenwald I. sel-7, a positive regulator of lin-12 activity, encodes a novel nuclear protein in Caenorhabditis elegans. Copyright (2004) with permission from the Genetics Society of America."

Sample curation results for a parental image when parental = cropped

Name "WBPicture0000000003"

Reference "WBPaper00024876"

Descritpion "Expression Pattern of rom-1::nls::gfp Expression pattern of the zhIs5[rom-1::nls::gfprom-1::] transcriptional reporter during vulval development. Images on the left (A, C, E, G, and I) show the corresponding Nomarski pictures with the arrows pointing at the Pn.p cell nuclei and the arrowhead indicating the position of the AC nucleus. (B) A mid L2 larva before vulval induction with uniform rom-1::nls::gfp expression in all the Pn.p cells. (D) An early L3 larva in which rom-1::nls::gfp expression was decreased in all VPCs except P6.p (see text for a quantification of the expression pattern). Note that the nuclei of hyp7 and the Pn.p cells that had fused to hyp7 displayed strong rom-1::nls::gfp expression (P1.p, P2.p, P3.p and P9.p in the example shown). (F) A mid to late L3 larva in which P6.p had generated four descendants. Expression of rom-1::nls::gfp occurred only in the 3° descendants of P.4.p and P8.p after they fused to hyp7. (H) An L4 larva during vulval invagination. No rom-1::nls::gfp was detectable in the 1° and 2° descendants of P5.p, P6.p, and P7.p, but the AC and the surrounding uterine cells displayed strong rom-1::nls::gfp expression. (K) A late L2 to early L3 larva following the ablation of the precursors of the somatic gonad. No up-regulation of rom-1::nls::gfp in P5.p, P6.p, or P7.p was observed. The scale bar in (K) is 10 μm."

Source "journal.pbio.0020334.g003"

Expr_pattern "Expr3457"

Remark "N/A"

Cellular_component "WBbt:0004017 Cell"

Anatomy_term "N/A"

Acknowledgments "WormBase wishes to thank the journal Genetics for permission to reproduce figures from this article. Please note that this material may be protected by copyright. Reprinted from Genetics, 166:151-60, Chen J, Li XJ, Greenwald I. sel-7, a positive regulator of lin-12 activity, encodes a novel nuclear protein in Caenorhabditis elegans. Copyright (2004) with permission from the Genetics Society of America."

Model testing

On the temrinal

Go to acedb_good folder

then

./xace /Users/danielaraciti/Desktop/Wormbase/ACEDB/ts

you have opened the empty database ready for testing.

then you can -> EDIT -> Read models -> yes -> continue


NOTES

For testing the model, the path is reading the models.wrm file that is in ts -> wspec -> models.wrm

Note that all the files that go to acedb should be plain text, so if you want to modify the models.wrm file you should convert it into plain text (in text edit -> format make plain text)

when you test a new model you should first go into ts/wspec, replace the old model with the new one, save and launch again the ts

If you get an error that says error line No 123: Edit -> Find -> Select line and you go to the selected line

At the moment the file I am playing around with is models.wrm. the original file was named modelsoriginal.wrm

In case you need to add back Variation to the ?Picture data model add the following tag in ?Variation model: Picture ?Picture XREF Variation #Evidence

and add in the ?Picture model the following Variation ?Variation XREF Picture

same true for RNAi and Transgene


CHANGES TO THE MODELS.WRM FILE

?Picture Description Text // not modified

Name UNIQUE Text // Added in ?Picture

Crop Crop_picture ?Picture XREF Cropped_from // added in ?Picture

Cropped_from ?Picture XREF Crop_picture //added in ?Picture

Pick me to call Text Text // not modified

Expr_pattern ?Expr_pattern XREF Picture // not modified

RNAi ?RNAi XREF Picture // deleted from ?Picture and deleted the XREF to Picture in RNAi class

Variation ?Variation XREF Picture // deleted from ?Picture and deleted the XREF to Picture in Variation class

Transgene ?Transgene XREF Picture // deleted from ?Picture and deleted the XREF to Picture in Transgene class

Remark ?Text #Evidence // not modified

Cellular_component ?GO_term XREF Picture //added in ?Picture and added "Picture ?Picture XREF Cellular_component" in ?GO_term

Anatomy_term ?Anatomy_term XREF Picture //added in ?Picture and added Picture ?Picture XREF Anatomy_term in ?Anatomy_term

Acknowledgments Template Text // added in ?Picture

Journal_name Text // added in ?Picture

Publication_year Text // added in ?Picture

Article_URL ?Database ?Database_field ?Accession_number // added in ?Picture

Publisher_URL ?Database ?Database_field ?Accession_number // added in ?Picture

Person_name Text // added in ?Picture

Reference ?Paper XREF Picture // added in ?Picture and added Picture ?Picture XREF Reference in ?Paper

Model testing modifications

Picture conversion

cd Desktop/WormBase/PLoS/Gene<tab>/ enterthis let me choose the directory from where I get the files

scp -r <directory_name I have chosen before> acedb@tazendra.caltech.edu:draciti/pictures/ enter I copy a folder and all its files into tazendra <enter password>

Open a new terminal

ssh acedb@tazendra.caltech.edu enter login into tazendra <enter password>

cd draciti/pictures/ enter this is the directory in tazendra where I will put my files cd <directory name> enter you can type ls for a list of files present in the directory igal2 -bigy 400 enter the program converts the pictures, bigy 400 is an arbitrary size for vertical pixels

ls -al it will list all the files present

If you want to open a new command window just press command+N on keyboard

the folder for getting the converted picures is called Incoming from Tazendra. We want to bring the files back: cd Desktop/Incoming\ from\ Tazendra/ enter

scp -r acedb@tazendra.caltech.edu:draciti/pictures/WBPaper00024399 . enter


(ssh = secure shell) (scp = secure copy

scp -r = recursive secure copy for directories)

(cd = change directory) pwd - show current directory