Contact Information and File Specifications

Contact Anne Marie Mahoney at the Genetics Society of America. The GSA has worked with us in the past to get International Meetings into WormBase.

They will send us a parseable file of abstracts in which each abstract is a separate file, named according to its program number.

Also ask for a participants list, containing names, addresses, email addresses, and institutions, for Cecilia's records.

Juancarlos can parse the abstracts into mangolassi where we can proofread them before entering them into the editor on tazendra.

File Format

The International Meeting Abstracts are sent to us in a specific file format, with tag:value attributes that are readily parsed into the paper tables (pap_).

Each abstract is sent as an individual file, named according to the abstract number, e.g. 826C.txt

Below is the list of tag names. Authors and Institutions are listed sequentially, with increased numerical values, if needed.

First place meeting abstracts in a folder on mangolassi
- For 2015 abstracts: /home/acedb/kimberly/meeting_abstracts/2015_iwm_meeting_abstracts/AbsFiles_2015/AbsFiles
In the postgres account, meeting abstracts are located in directories, named according to the meeting, here: /home/postgres/work/pgpopulation/pap_papers/abstracts
There are two scripts that we run for processing the abstracts:
- unaccentHtml.pl - this script converts HTML characters and names to text wherever possible, and converts unaccented characters to accented characters
  - æ represented in HTML characters as & # 230 ; (note that these symbols are strung together in the actual code)
- parse.pl - this script parses the abstract information into the correpsonding paper tables
  - The parse.pl script has some values that need to be set for each specific meeting:
    - year
    - identifier prefix (e.g., wm2015)
The previous version of the scripts are here: /home/postgres/work/pgpopulation/pap_papers/abstracts/iwm2013

Things to check for in parsed files:

Does the abstract number in the identifier match the abstract number in the program?
Are author names correct, i.e. are there any foreign characters not translated correctly?
Does the text of the abstract start and stop at the right places?
Are symbols represented properly, e.g. 3'UTR?