WBConfCall 2014.07.03-Agenda and Minutes

From WormBaseWiki
Jump to navigationJump to search

Contents

Agenda

Enter Topics Here

Apologies from Mary Ann

WS245 Schedule

Proposed Hinxton release Friday 29th August

Proposed upload date Friday 1st August (- Sunday 3rd)

Models 3 weeks prior Friday 11th July

Person evidence vs Publication evidence

(Karen, Kimberly)
Questions have come up about what to do with user submitted evidence that can also be extracted from a publication. In particular, when users submit data through an online form, that information is captured and entered with the submitter of the data as person evidence.

Q1. If those data are then published by the same person, should the person evidence be deleted and replaced by the publication as evidence, or should those data retain the person as evidence as well as having the publication attached?

[Mary Ann - here are my thoughts. I support the latter, though looking at the Evidence attached to CGC_name is does not look like the website is displaying it anyway (a separate issue). In any case, I think there is no harm in leaving Person_evidence there. It adds weight to our curation.]

Q2. If and when person and publication are both used as evidence, are there ways to highlight publication evidence over person evidence in our various data output modes (website, GAF files, WormMine queries, scripted concise descriptions, etc.)?

[Mary Ann - I think this is a good idea, though it must be obvious to users that publications take precedence over people. But if you're talking about ensuring that Papers are listed at the top of a list of Evidence values then yes, I support that]

Resurrection of the second all-hands conference call?

A few months ago, we changed the second conference call of the month to a more focussed smaller-group meeting on a specific topic. This seems to have failed (we've only had one of these). The results of a poll at Hinxton suggested a feeling that communication has become worse since we went to a once-a-month schedule. Proposal then: reinstate the second all-hands conference call. Topic-based calls for smaller groups can be arranged outside the main calls when the need arises.

Quick Models Update

3 Models for WS245 addition (Will be tagged next Friday 11th as this is 3 weeks prior to upload for WS245 if this is the agreed schedule)

1) The long awaited ?Construct class - Karen

Is it much different to the working version we had for WS244? http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/wormbase/wspec/models.wrm?revision=1.404&root=ensembl&view=markup

[Karen - not much different at all.]

2) Clone Class additional ?Sequence connections to remove ambiguity - Paul

 Sequence ?Sequence
 End_sequence ?Sequence

3) Interaction class - Chris

Addition of "DNase_I_footprinting" to detection methods.

Minutes

WS245 release schedule looks good.

CVS Tagged next Friday July 11


Person evidence vs Publication evidence

Karen : Keep person evidence + subsequent publication evidence ?

Michael P : Yes, may as well keep all evidence.

Karen : Can we prioritize one evidence over another ?

Kevin : Not in the schema.

Todd : Sure on the website, can sort by date.

Karen : Prefer publications taking precedence.

Todd : We can do that as well.

Kevin : Evidence hash is a tree of tags, shouldn't be a problem.

Karen : Can we prioritize display of data based on the type of evidence it has ?

Kimberly : How to communicate to users data that only has person communication as evidence. How should users cite WormBase ? How should we make it clear to users that some is published and some isn't.

Todd : Tooltip for evidence, show it's provenance.

Michael P : Take care if the Person with evidence is not the author of the Publication.

Paul S : How many cases of someone communicating data and not publishing for years ? (we don't know)

Kimberly : Author just as Paper or also Person evidence ?

Paul D : Use both and also accession evidence.

Karen : User submission forms are personal communication ?

Kimberly : Yes, if there is no paper.

Karen : Sort by timestamp clearer ?

Raymond : Doesn't seem right for the paper not to get priority. We don't need to worry about ordering, keep it simple.

Todd : Agree with Raymond, rely on users to look at data + evidence and make their own decisions.

Kevin : We'll need to show them the date it was added then.

Todd : Yes

Kimberly : And also tell them what does Person_evidence means.

Raymond : To display add prefix to say Personal Communication.

Kimberly : Caltech has used the tag that way, but maybe not everyone has ?

Kevin : Can add a new Personal_communication tag to #Evidence. Pop-up / tooltip seems good.

Todd : Pop-up / tooltip would be good for all #Evidence tags.

Karen : To summarize, new tag Personal_communication, Caltech will dump to that. Web display will be clearer and not prioritized.

Kimberly : Also explain what Curator_confirmed means. Do we create different Phenotype objects when publication is different from prior personal communication ?

Paul D : If they publish something different, the personal communication was probably not correct, ask them for clarification.

Karen : Need to keep both because another user might have used the original data based on personal communication.

Paul D : If different could treat as conflicting data from different people.

Gary S : In Phenotype might mean that the data is variable and need to keep both.

Kevin : How do we deal with retractions when the data is no longer correct ? We have timestamps but don't show them to users.

Raymond : People often give us new data, but don't retract the previous data.

Paul D : What should we do when they do retract it ?

Ranjana : Add a 'retracted' label.

Paul S : How often does this happen ? It might disincentivize user submissions. We're not changing meeting abstract information.

Karen : For Phenotypes it will work itself out, but for Transgenes they have to be corrected and have remarks.

Raymond : Keep all previous WS website releases so users can see why they used that.

Paul S : Not optimal to keep so many releases running.

Raymond : Good issue to consider for database migration. There's a difference between tracking typos and real data changes, would be good to have a way to tag.

Kimberly : Date_last_updated tag to track significance.

Daniela : Change OA to track it.

Juancarlos : We could, but need the future database to support it.

Raymond : Talk more at Caltech meeting.

Kimberly : Add to wiki definitions of evidence tags that people can edit.

List of Existing WB Evidences and Their Use

Paper_evidence ?Paper // Data from a Paper

  • Used for citing the paper from which an annotation is made.
    • Concise descriptions (Caltech)
    • GO (Caltech and Automated Pipelines)
    • Gene (Mary Ann)
    • Variation (molecular lesion) (Mary Ann)
    • Sequence Features (Mary Ann)
    • Used in objects: Feature, CDS, Transcript, Pseudogene to denote the paper which asserted this structure or location.(GaryW)
    • Variation phenotype (Karen)
    • Transgenes (Karen)
    • Topics (Karen)
    • Construct(Karen)

Published_as ?Text // .. track other names for the same data

  • Used in ?Paper objects to indicate the name of a gene as it appears in the paper.
    • Note that this is typically only used when the name is not in WormBase and is unlikely to included in WormBase (e.g., a one-off name in a Review article).
  • Not currently used for Transgene but could be useful

Person_evidence ?Person // Data communicated to wormbase from a Person (pre-publication/unpublished)

  • Used to cite the Person who communicated information for an annotation.
    • Caltech has typically used this to indicate a Personal Communication.
    • CGC_name. Usually this is the CGC_representative of the Designating_laboratory for the associated Gene_class. Will be added regardless of communication with author. (Mary Ann)
    • Variation. To indicate Personal Communication. (Mary Ann)
    • Should we change this tag name to be 'Personal_communication'?
    • Used in objects Feature, CDS, Transcript, Pseudogene to identify the person who gave personal communication evidence for this structure or location (GaryW).
    • RNAi. To indicate Personal Communication. (GaryS)
    • Phenotype (variation, transgene, rearrangement, strains) (Karen)
    • Transgene (Karen)
    • Construct (Karen)


Author_evidence ?Author UNIQUE Text // Data from an Author

  • Used when no WBPersonID exists - rare (Mary Ann) (Cecilia has a quick turn around for making new Person objects, there are many that exist just connected to a single Paper, so it should be fine to request one, if a 1-day maximum wait is enough -- Juancarlos)

Accession_evidence ?Database ?Text // Data from a database (NDB/UNIPROT etc)

  • Used in objects Feature, CDS, Transcript, Pseudogene, Analysis, Condition to identify the database, field and ID of evidence to support this object (GaryW)

Protein_id_evidence ?Text // Reference a protein_ID

  • Used in objects CDS, Transcript to identify the homologous/similar proteins used as evidence for the structure (GaryW)

GO_term_evidence ?GO_term // Reference a GO_term

Expr_pattern_evidence ?Expr_pattern // Reference a Expression pattern

  • Has been used in concise descriptions to link statements to a specific WB object (as for RNAi_evidence). Not widely used at all.

Microarray_results_evidence ?Microarray_results // Reference a Microarray result

  • Has been used in concise descriptions to link statements to a specific WB object (as for RNAi_evidence). Not widely used at all.

RNAi_evidence ?RNAi // Reference a RNAi knockdown

  • Has been used in concise descriptions when we were experimenting with linking descriptions to specific WB objects, in addition to the paper that published the data.

CGC_data_submission // bless the data as coming from CGC

  • Legacy (Mary Ann)

Curator_confirmed ?Person // bless the data manually

  • For concise descriptions, this indicates the curator who wrote the description. (I'm not sure if it's concise or GO, but doesn't one of them track all the curators that have ever added anything to the concise description [as opposed to the most recent one] -- Juancarlos)
  • For GO annotations, this indicates the curator who made the annotation.
  • Human Disease curation also uses this -- Juancarlos
  • Used in objects CDS, Transcript, Pseudogene to identify the curator who asserted the structure based on their experience (GaryW)
  • Phenotype (variation, transgene, rearrangement, strains) (Karen)

Inferred_automatically Text // bless the data via a script

  • Information is populated based on the results of a script.
    • Used in ?Paper objects when genes are associated with papers by a pattern-matching script.
    • Used in GO annotations for Phenotype2GO-based annotations and InterPro2GO-based annotations. These annotations may be based on manually curated mappings, but each individual annotation is not reviewed prior to being entered into WB.
    • I used it for automatically transferred CGC_names in Genes (Michael)

Date_last_updated UNIQUE DateType // Stores last update timestamp

  • Used for concise descriptions to indicate when the description was written or edited.
    • For concise descriptions, the date is usually only changed when a major edit that changes the information content is made, i.e. fixing a typo doesn't warrant a change of the date.
  • Some other curation groups use a Date_last_reviewed evidence instead. This is then updated if/when a curator reviews the annotation regardless of whether or not they've made any changes. The two seem to have somewhat different meanings. What about using both?

Feature_evidence ?Feature // Reference a Feature - eg for creation of isoform based on TEC-RED SL2

  • Used in objects CDS, Transcript, Pseudogene to identify the supporting evidence for a structure (GaryW)

Laboratory_evidence ?Laboratory // Reference a Lab

From_analysis ?Analysis // Reference an analysis

  • For Gene object curation, when orthologs are added via analysis scripts (Mary Ann)
  • Used in objects Feature, CDS, Transcript, Pseudogene to identify the project and conditions to support the site or structure (GaryW)
  • There was also discussion about using ?Analysis objects to credit manual annotation from other projects, e.g. GO annotations from other groups like UniProt. Would like to confirm this is still how we plan to handle this.

Variation_evidence ?Variation // Explicitly record variation from which IMP manual GO annotations are made

  • Used in GO curation for Phenotype2GO annotations based on variation phenotypes.
  • Will probably get phased out with new GO model.

Mass_spec_evidence ?Mass_spec_peptide

  • Used in objects CDS, Transcript to identify the mass-spec peptides giving supporting evidence for the structure (GaryW)

Sequence_evidence ?Sequence // for sequence data that hasn't been submitted to a public resource

Remark ?Text

Resurrection of the second all-hands conference call

Kevin : Hinxton thinks we could use an extra call.

Paul S : Yes, let's do it.

Paul K : Why hasn't the database call been happening.

Kevin : People have been working on different things, need someone dedicated to work on the future database.

Paul K : Hinxton is stuck with the duty.

Todd: notes added after minutes being posted: I don't recall Paul K saying that Hinxton is stuck with the duty. If he did in fact say that it is 100% untrue. In fact, OICR and Caltech have dedicated substantive resources to the project. Neither team currently has the luxury of devoting an FTE to the project as they already have other duties to attend to.

Kevin: note added Todd's note: Paul K did not say that Hinxton is stuck with the duty. He said that Hinxton is about to be in a position to dedicate a whole FTE to the project, and in an ideal world, it would be great if OICR and/or Caltech were able to do the same, for a period of time at least.

Paul S : New person starting at Hinxton in August. Have 2nd call on the third week, and discuss database afterward.

Models Update

Construct

Paul D : Construct class, how is it going ?

Karen : It's great. Small tweak compared to cvs. Mary Ann okayed it.

Paul D : A lot of tags removed.

Karen : Need more rigorous testing, change a tag name. Can get it by the 11th.

Paul D : Will incorporate changes, submit tweaks next week.


Clone

Paul D : Component sequences are end sequence, want an end sequence tag.

Chris : Good idea.


Interaction

Paul D : Adding a new Detection Method, looks fine.



Cross-References

Michael P : More crossreference to Genes + CDS + similar data, to link to Ensembl Genomes. Is there a wiki for external databases ?

Paul S : External resources at the bottom ?

Raymond : External links widget.

Chris : Page with more information somewhere.

Todd : Friends link on footer. External page would get lost.

Raymond : Database class links to external database front page.

Todd : Good idea to have overview of what we do and what resources we trust.

Raymond : Better if maintenance-free.

Kevin : UCSC supports genome hub in their genome browser, gives us control of nematode sequence releases.