Revision as of 18:18, 9 September 2019

Contact Information and File Specifications

Contact Anne Marie Mahoney at the Genetics Society of America. The GSA has worked with us in the past to get International Meetings into WormBase.

They will send us a parseable file of abstracts in which each abstract is a separate file, named according to its program number.

Also ask for a participants list, containing names, addresses, email addresses, and institutions, for Cecilia's records.

Juancarlos can parse the abstracts into mangolassi where we can proofread them before entering them into the editor on tazendra.

File Format

The International Meeting Abstracts are sent to us in a specific file format, with tag:value attributes that are readily parsed into the paper tables (pap_).

Each abstract is sent as an individual file, named according to the abstract number, e.g. 826C.txt

Below is the list of tag names. Authors and Institutions are listed sequentially, with increased numerical values, if needed.

- AbstractNo :
- Title :
- Author 1 :
- Presenting Author :
- Study Group :
- Author 1 Affiliation :
- Institution 1 :
- Body of Abstract :

Parsing Scripts

First place meeting abstracts in a folder on mangolassi
- For 2019 abstracts: /home/acedb/kimberly/meeting_abstracts/2019_iwm_meeting_abstracts/AbsFiles_2019/AbsFilesWormBase-20190719174941
In the postgres account, meeting abstracts are located in directories, named according to the meeting, here: /home/postgres/work/pgpopulation/pap_papers/abstracts
There are two scripts that we run for processing the abstracts: Update documentation here wrt species and additional parsing scripts
- unaccentHtml.pl - this script converts HTML characters and names to text wherever possible, and converts accented characters to unaccented characters, for example:
  - æ represented in HTML characters as & # 230 ; (note that these symbols are strung together in the actual code)
  - ω represented in HTML names as & omega ; (note that these symbols are strung together in the actual code)
  - This script searches the text for each of the above type of HTML coding and then we manually make sure we have a mapping for each
  - This script also strips out HTML for things like paragraph, font, etc.
- parse.pl - this script parses the abstract information into the correpsonding paper tables
  - The parse.pl script has some values that need to be set for each specific meeting:
    - year
    - identifier prefix (e.g., wm2019)
The most recent versions of the scripts are here: /home/postgres/work/pgpopulation/pap_papers/abstracts/iwm2017 Need to look at which scripts to port over to iwm2019 directory

Proofreading

Things to check for in parsed files:

Does the abstract number in the identifier match the abstract number in the program?
Are author names correct, i.e. are there any foreign characters not translated correctly?
Does the text of the abstract start and stop at the right places?
Are symbols represented properly, e.g. 3'UTR?

Back to Paper pipeline

@@ Line 29: / Line 29: @@
 *First place meeting abstracts in a folder on mangolassi
-**For 2015 abstracts: /home/acedb/kimberly/meeting_abstracts/2015_iwm_meeting_abstracts/AbsFiles_2015/AbsFiles
+**For 2019 abstracts: /home/acedb/kimberly/meeting_abstracts/2019_iwm_meeting_abstracts/AbsFiles_2019/AbsFilesWormBase-20190719174941
 *In the postgres account, meeting abstracts are located in directories, named according to the meeting, here:  /home/postgres/work/pgpopulation/pap_papers/abstracts
-*There are two scripts that we run for processing the abstracts:
+*There are two scripts that we run for processing the abstracts: '''Update documentation here''' wrt species and additional parsing scripts
 **unaccentHtml.pl - this script converts HTML characters and names to text wherever possible, and converts accented characters to unaccented characters, for example:
 ***&#230;  represented in HTML characters as & # 230 ;  (note that these symbols are strung together in the actual code)
@@ Line 40: / Line 40: @@
 ***The parse.pl script has some values that need to be set for each specific meeting:
 ****year
-****identifier prefix (e.g., wm2015)
+****identifier prefix (e.g., wm2019)
-*The most recent versions of the scripts are here: /home/postgres/work/pgpopulation/pap_papers/abstracts/iwm2015
+*The most recent versions of the scripts are here: /home/postgres/work/pgpopulation/pap_papers/abstracts/iwm2017 '''Need to look at which scripts to port over to iwm2019 directory'''
 =Proofreading=

Difference between revisions of "International C. elegans Meetings - UCLA"

Revision as of 18:18, 9 September 2019

Contents

Contact Information and File Specifications

File Format

Parsing Scripts

Proofreading

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools