Revision as of 18:18, 9 September 2019

Contact Information and File Specifications

Contact Anne Marie Mahoney at the Genetics Society of America. The GSA has worked with us in the past to get International Meetings into WormBase.

They will send us a parseable file of abstracts in which each abstract is a separate file, named according to its program number.

Also ask for a participants list, containing names, addresses, email addresses, and institutions, for Cecilia's records.

Juancarlos can parse the abstracts into mangolassi where we can proofread them before entering them into the editor on tazendra.

The International Meeting Abstracts are sent to us in a specific file format, with tag:value attributes that are readily parsed into the paper tables (pap_).

Each abstract is sent as an individual file, named according to the abstract number, e.g. 826C.txt

Below is the list of tag names. Authors and Institutions are listed sequentially, with increased numerical values, if needed.

First place meeting abstracts in a folder on mangolassi
- For 2019 abstracts: /home/acedb/kimberly/meeting_abstracts/2019_iwm_meeting_abstracts/AbsFiles_2019/AbsFilesWormBase-20190719174941
In the postgres account, meeting abstracts are located in directories, named according to the meeting, here: /home/postgres/work/pgpopulation/pap_papers/abstracts
There are two scripts that we run for processing the abstracts: Update documentation here wrt species and additional parsing scripts
- unaccentHtml.pl - this script converts HTML characters and names to text wherever possible, and converts accented characters to unaccented characters, for example:
  - æ represented in HTML characters as & # 230 ; (note that these symbols are strung together in the actual code)
  - ω represented in HTML names as & omega ; (note that these symbols are strung together in the actual code)
  - This script searches the text for each of the above type of HTML coding and then we manually make sure we have a mapping for each
  - This script also strips out HTML for things like paragraph, font, etc.
- parse.pl - this script parses the abstract information into the correpsonding paper tables
  - The parse.pl script has some values that need to be set for each specific meeting:
    - year
    - identifier prefix (e.g., wm2019)
The most recent versions of the scripts are here: /home/postgres/work/pgpopulation/pap_papers/abstracts/iwm2017 Need to look at which scripts to port over to iwm2019 directory

Things to check for in parsed files:

Does the abstract number in the identifier match the abstract number in the program?
Are author names correct, i.e. are there any foreign characters not translated correctly?
Does the text of the abstract start and stop at the right places?
Are symbols represented properly, e.g. 3'UTR?

@@ Line 41: / Line 41: @@
 ****year
 ****identifier prefix (e.g., wm2019)
-*The most recent versions of the scripts are here: /home/postgres/work/pgpopulation/pap_papers/abstracts/iwm2019
+*The most recent versions of the scripts are here: /home/postgres/work/pgpopulation/pap_papers/abstracts/iwm2017 '''Need to look at which scripts to port over to iwm2019 directory'''
 =Proofreading=