WormBase-Caltech Weekly Calls
From WormBaseWiki
Jump to navigationJump to search
2011 Meetings
March 3, 2011
SUMMARY:
Every other month patch:
- We will try to generate an ACEDB patch every other month, starting with the Paper class data
- Need to coordinate with Juancarlos and Todd
Need to check .ACE patch files before upload:
- Will need to be vigilant about checking for errors and inconsistencies before sending to Todd to put up on website
- Curators need to check their own data for problems
- Wen will check for consistency between the different data types
Think about connecting website to Postgres:
- Think about showing Postgres data "immediately" on site
DETAILS:
Delayed release cycle:
- Will require more work to prepare for more frequent release of certain data types
- Aside from Kimberly's data, most data types are not urgent (e.g. Expression pattern)
- What are the users feeling?
- Having data faster will help users; they don't ask, because they don't see it
- On-the-fly updating of website? Like Postgres?
- Since we use ACEDB, we have to patch WS with .ACE file, or rebuild whole thing
- Flat file Postgres database, replaced every night?
- Website calls Postgres directly for certain data types?
- Performing build without sequence is easy? Do everything without sequence?
- How to integrate sequence data with other data once they're decoupled through the patching process?
- We need .ACE patch files
- Concise description separate from most else (but connected to papers)
- Do papers first?
- Website can show anything
- If we have a lot of patches, will not have check for data inconsistency/confliction
- Trial patch .ace files for papers first
- Juancarlos: Scripts that check differences between data dumps; scripts are data type specific
- Curators need to talk to Juancarlos about the importance of different data tags
- Paper .ACE file: Would include bibliographic info, journals, authors, genes associated from abstract or added manually
- One reason for more frequent releases: because we have first pass author forms; show them we add it quickly
- what will be added through the forms: expression patterns? RNAi (difficult?)?
- We should check patch before we send to Todd!!! Don't want to crash database
- How frequently to patch? Weekly? Daily? Check with Todd, how often he can load them?
- Chron job to create patch ACE files, send to curators to check for problems, then send to Todd
- Interdependency of data types; curators rely on other curators?
- Postgres directly to website? Todd would have to work it out
- New information flag on website? Toggle visibility?
- How do we know that the data do not conflict with each other?
- What are common problems? Dumper script goes bad, makes broken lines, empty fields
- Error catching mechanisms? More checks on postgres? Dump files?
- Data merging problems? What are the cases that are conflicts? Prevent them? Know beforehand?
- If we don't know, as long as it doesn't crash the database or fail to load, then OK
- Don't do -D stuff, maybe? No deletions? Skip typos?
- Always have to check ACE files anyway, but have to do every week (2 weeks?)
- We can try a patch every other month
- What can we do without the patch?
- Did SAB talk about changing to relational databases?
- Get website going as is first, and see if it matters?
- If people don't want to change data models, we can switch over to relational
- Separate panel on website directly from Postgres?
- Wen can check the data integration every other month for patch
March 10th, 2011
Release schedule and patches
- What is the appropriate frequency?
- Scheme: do what we're already doing, Wen merges into citace
- Excluding sequence related data
- Need to include Mary Ann's data (strains etc.)
- Daily update too frequent; maybe once per month/week
- Submit .ACE file to Todd with simple syntax; easily parsable; old description removed and new information added
- Make updates only in contrast to last WS, not previous patch/temporary upload
- ACEDB diff step only relative to WS
- Wen: Postgres can dump diff ace file; already have diff ace files for every data type at Caltech; integrate into citace;
- Raymond: integration is important; we need to talk about how much work needs to be done by each approach
- Wen: consistency checks, backups, store each version?
- Rayomnd: citace 224 to 225 (for example), display done on class level
- Example: Gene page; only update information relevant to gene class to be displayed
- Do once per month: faster than currently because it doesn't have to go through the dev site
- If we update Citace to Citace, missing a lot of cross-references?
- Mock citace with Mary Ann's data? Becomes diff base; Mary Ann submits (non-sequence related) data directly to WBCIT
- Build low-connectivity ace at WBCIT? Add Mary Ann's data, remove RNAi
- Todd: important consideration: things added won't be available for search until formal release; weird things about diff; new reference associations with genes; a lot of duplications?
- Raymond: will look at it
- Todd: WBGene00000846, example, see how fast it loads, go from there; would like a single ACE file (concatenation of all individual ace files); would not happen on development; would have to happen on production releases; would take production database off line, clone it, and upload it to all production nodes
- Individual curators need to check their individual ace files for errors
- Frequency: monthly
- Todd: we should just run some tests first, to check feasibility
- Raymond: WBGene1, example, concise description has typo, WS225 has typo fixed from WS224, diff file shows:
- -D old_description
- new_description
- Load diff ace into original database?
- parallel display; unrelated to resident WS?
- Todd: producing two web pages for each object?
- Raymond: No, only changing relevant tags, etc.
- Todd: Two databases running at same time inefficient; include timestamps?
- Raymond: No, cannot include timestamps
- Wen: Send patch ace files to Todd
- Raymond: In conflict with versioning; how to show new data
- Wen: Call it "WS225.1"
Human Disease Relevance tag in Concise Description
- Ranjana: Sent out e-mail; human disease tag "Human Disease Relevance"; to clean up concise description form (old tags in form outdated); could be putting more information into concise description; OMIM human disease
- Raymond: make not just text field, but make entity field pointing to object; meant to be human readable, this may break up the concise description into OMIM-related and OMIM-non-related info; why parse the data into a tag?
- Paul S: OMIM descriptions as a separate tag
- Raymond: Rewrite concise description?
- Ranjana: No
- Paul S: Would you mention human disease relevance in concise description? yes, but if it's just a link out to OMIM, then separate out; OMIM may have changed since Erich wrote original script; check OMIM for new information and tags that may be able to get pulled out
- Ranjana: Michael Paulini can consolidate orthology information?
Karen: Transgene model
- A lot of changes to propose
- Deletions of tags; more coming
- Other things in database connected to transgenes
- Many things in transgene objects that may be able to be parsed into different tags (new job for someone?)
- Strict nomenclature for transgene descriptor
- Clones present need to be parsed into clone class?
- Todd made Clone page;
- Start with vectors/backbones and then work on specific plasmids
Gene class-phenotype connections and descriptions?
GSA markup at Flybase
- Flybase is not willing to fully QC all papers
- Do we push Flybase and/or provide a better tool to QC?
- Are we worried about the GSA markup for flies not looking professional?
- People need to be willing to pay for the QC/curation; depends on database priorities
- CIT will spend more time on in-house development to make Fly GSA markup easier/more efficient
Putting SPELL on Amazon cloud?