Difference between revisions of "Transgene curation pipeline"

Revision as of 00:22, 20 February 2014

back to Transgenes back to Caltech documentation

Importing transgenes from textpresso

every day at 4am :

/home/postgres/work/pgpopulation/textpresso/wrapper.sh

call :

/home/postgres/work/pgpopulation/textpresso/transgene/update_textpresso_transgene.pl
/home/postgres/work/pgpopulation/textpresso/antibody/update_textpresso_antibody.pl
/home/postgres/work/pgpopulation/afp_papers/find_passwd_@.pl
/home/postgres/public_html/cgi-bin/data/ccc_gocuration/get_newset.pl

Relevant script now :

/home/postgres/work/pgpopulation/textpresso/transgene/update_textpresso_transgene.pl

gets data from

http://textpresso-dev.caltech.edu/transgene/transgenes_in_regular_papers.out

--kjy 19:35, 24 May 2012 (UTC)

Curating transgenes

All transgenes entered through the Textpresso pipeline are annotated with Arun as curator - these can be pulled out by searching for Arun in the curator field. Xiaodong and Daniela also enter transgenes that need curation, these can be retrieved by searching for their names in the curator field.

[defunct?] If you want to see all the current 'new' transgenes picked up by Textpresso, go to Tab 3 and press the "Search New Transgene" retrieve button. This action with retrieve all transgene objects that have data in the Summary or Remark fields. Usually there will be paper object info already since it was entered from the Textpresso search.

Curators should look for information on new transgenes in the paper or supplementary files.

If authors do not use proper nomenclature for their transgene

look in referenced papers for the name (be sure to check if the transgene already exists in postgres under that name, if so, attached the paper to it).
assign a WBPaper<##>Is# or WBPaper<##>Ex# name

Here is some controlled vocabulary for the transgene summary field:

Remark "Conflicting mapping info: ..."
Remark "Conflicting genotype: ..."
Remark "No transgene info in original publication."
Remark "Other integration method: ..."
Remark "Clone = "
Remark "Mapping info: "

Assigning WBTransgene IDs

All transgenes should have a unique WBTransgeneID, except the ones that have been annotated as FALSE objects. These are assigned automatically when creating a NEW line in the OA. If objects are duplicated in the OA, you need to make sure the WBTransgeneID is not copied along with the other information, delete the copied WBTransgeneID if it was. In these cases, WBTransgeneIDs should be assigned through a cron job that is set to run every morning at 4am. The WBTransgeneID is assigned based on the pgid.

0 4 * * * /home/acedb/karen/cronjobs/assign_transgene_IDs.pl
/home/postgres/work/pgpopulation/transgene/20121004_assign_transgene_IDs/assign_transgene_IDs.pl

The script looks at data from trp_name trp_objpap_falsepos, and trp_curator . Anything that exists in trp_curator and has neither a trp_name nor a trp_objpap_falsepos gets an ID assigned by padding the joinkey to 8 digits, adding WBTransgene in front, and adding to trp_name and trp_name_hst . NOTE: if we ever change any of the those table names this script will not work properly NOTE: interaction and protein call the "False Positive" tables 'falsepositive' instead of objpap_falsepos

Dumping Transgenes

Transgenes are dumped every Wed at 4 am and are picked up by spica every Thurs at 8am.

0 4 * * wed cd /home/acedb/karen/transgene; ./use_package.pl

transgene use_package.pl writes to /home/postgres/work/citace_upload/transgene/transgene.ace and transgnene.ace<date>

transgene.ace in the directory /home/acedb/public_html/karen/ is sym linked

transgene.ace -> /home/acedb/karen/transgene/transgene.ace

Transgene OA

Autocomplete
Multi-ontology
Big text = free text, editable expanded box
Selection list
Selection list multiple list field
multi-drop down
Free text, pipe separated values

Tab 1 (Transgene)

Pgdbid - postgres database ID, autogenerated
Name -- trp_name -- assigned WBTransgeneID
Public_name -- trp_public_name text
Synonym -- trp_synonym -- text, pipe separated values
Summary -- trp_summary -- bigtext
Driven by Gene -- trp_driven_by_gene -- multiontology WBGene, enter gene by public/sequence name, postgres stores WBGeneID, make sure entries are unique
Reporter Product -- trp_reporter_product -- multidropdown list -> need to know where the file is so I can edit it.
Other Reporter -- trp_other_reporter -- text, pipe separated values
Gene change to -- trp_gene -- multiontology WBGene
3'UTR -- trp_threeutr -- multiontology WBGene, added by Chris 5/24/12
deleted -Rescues -- trp_rescues -- multiontology WBvariation ->Changed model and deleted tag.
Coinjection marker -- trp_coinjection -- text, not dumped
Reporter type -- trp_reporter_type -- dropdown list values of transcriptional or translational reporter
Remark -- trp_remark -- bigtext, pipe separated values

Tab 2 (Isolation)

Clone/plasmid -- trp_clone -- multiontology using Clone list from acedb select a, a->general_remark, a->positive_gene from a in class clone where a->type = "Plasmid" (remove all "sjj_" clones). also see e-mail. (for now this will be a no dump field).
Integration method -- trp_integration_method -- multidrop down list -> used to be integrated by
Map -- trp_map -- multidropdown list
Map Paper -- trp_map_paper -- multiontology paper
Map Person -- trp_map_person -- multiontology person
Laboratory -- trp_laboratory -- multiontology laboratory, used be be called Location
Strain -- trp_strain -- text, pipe separated values

Tab 3 (Expression)

Curator -- trp_curator -- dropdown
Paper -- trp_paper -- multiontology paper
Person -- trp_person -- multiontology
Marker for-- trp_marker_for -- text
Marker Paper-- trp_maker_for_paper -- multiontology paper
Species-- trp_species -- text
Driven by Construct-- trp_driven_by_construct -- text
Search New Transgene -- trp_searchnew -- queryonly
Fail -- trp_objpap_falsepos -- toogle -- for marking transgenes that are falsely attributed to a given paper.

--kjy 17:27, 25 June 2012 (UTC)

Original Phenote transgene.cfg

Invoke the phenote transgene configuration interface and access postres

go to directory with phenote
 $./phenote -c worm-transgene.cfg

T= free text; M= multiple values, separate values with a pipe(|); S= selection list

Tab 1

Pgdbid- postgres database ID, entered automatically when curator enters a new transgene

Name(T)-approved name following Lab-prefix (or WBPaperID), Is or Ex, number

Summary(T)- genotype, including co-injection marker and relevant information about making the construct, if papers rport conflicting genotypes use Remark field and controlled vocabulary "Conflicting genotype: ...", if no information enter "No transgene info in original publication." in Remark field.

Driven by Gene(T,M)- enter WBGeneID used for promoters in every promoter driven construct of the transgene

Reporter Product(S,M)- list has common reporter genes, GFP, RFP, LacZ, etc.

Other Reporter(T,M)- enter other products encoded as reporters that do not appear in the drop down list

Gene(T,M)- enter WBGeneID for protein output of construct, which isn't considered a reporter product

Integrated by(S)- choose integration method if known, use 'not integrated' for Ex transgenes, if integration method is not listed, use Remark field and controlled vocabulary: "Other integration method: ..."

Strain(T,M)- not used consistently, enter approved strain names for those strains that contain the transgene

Map(S,M)- choose LG(s) of integrated array if known, if papers report differing map positions use Remark field and controlled vocabulary "Conflicting mapping info: ..."

Tab 2

Map Paper(T,M?)- WBPaperID for paper that reports mapping info
Map Person(T,M)- WBPersonID? or Name, person evidence
Marker for(T,M)- not used, Wen's expression data
Marker Paper(T,M)- WBPaperID, not used, Wen's expression data
Reference(T,M)- WBPaperID, generally autofilled by Textpresso cron job script and bulk upload of Ex search script, add new paper if necessary
Remark(T,M)- catch all used for clarifying info from other fields, and for entering construct specifics, in some cases use controlled vocabulary
- "Conflicting mapping info: ..."
- "Conflicting genotype: ..."
- "No transgene info in original publication."
- "Other integration method: ..."
- "Clone = "
- "Mapping info: "
Species(T,M?)- not used?
Synonym(T,M)- other names for the transgene or construct
Driven by Construct(T,M)- not sure what this is
Location(T,M)- Lab designations for people who have the transgene, not sure about this.

Tab 3

Movie(T)- used?
Picture(T)- used?
Search New Transgene(T)- use to retrieve all transgenes that do not have any summary or remark data
SQL-use?

Way in the future

Genomic Expression->New field - for developing a standardizing transgene expression nomenclature, eventually we will want to fill this by script composition based on other fields (promoters, reporters, clones)

Difference between revisions of "Transgene curation pipeline"

Revision as of 00:22, 20 February 2014

Contents