Difference between revisions of "Missing PMIDs"

From WormBaseWiki
Jump to navigationJump to search
Line 67: Line 67:
  
 
We could display this information (not found, Coverage v1n1, July 2005-) in the paper editor in a field named pmid_check below the identifier field.  We wouldn't have to dump this in the .ace file.  The field could be editable by a curator, though.
 
We could display this information (not found, Coverage v1n1, July 2005-) in the paper editor in a field named pmid_check below the identifier field.  We wouldn't have to dump this in the .ace file.  The field could be editable by a curator, though.
 +
 +
'''3) When no PMID is found, check and correct bibliographic errors in WBPaper objects:'''
 +
 +
If a paper is published in a journal that *is* currently indexed for PubMed, then there are several possibilities I can think of:
 +
 +
a) as described above, the paper was published before the journal starting being indexed for PubMed and they haven't retroactively indexed the older articles
 +
 +
b) the journal is normally indexed, but the particular article we have is of a type not typically indexed (see WBPaper00013369 and its accompanying PMID for an example)
 +
 +
c) there's something incorrect about our WBPaper entry - see WBPaper00000816 for an example.  WBPaper00000816 is actually attributed to the journal Science, but if you look at the pdf, the article actually looks to be from a periodical like Discover and is written by a science journalist (whose name we don't have correct!)
  
 
    
 
    

Revision as of 20:00, 25 October 2011

email thread subject: XML of PubMed-indexed journals to filter search

Objectives:

1) Add missing PMIDs to WBPaper objects wherever possible

2) When papers have not been indexed by PubMed, add that information to postgres along with details of indexing; display on paper editor

3) When no PMID is found, check and correct bibliographic errors in WBPaper objects

4) Where possible, add doi's for papers not indexed by PubMed

5) If time permits, replace journal titles with standard NLM journal abbreviations


1) Add missing PMIDs to WBPaper objects wherever possible

As of 9/1/11, there were 997 WBPaper objects that did not have a PMID. We know that some paper objects in WB will likely never have a PMID, for example corrections and errata, but if there are papers that *do* have a missing PMID, then we want to add the PMID wherever possible.

Strategy: Using XML file of journals currently indexed for MEDLINE, compile a list of WBPaper objects that lack a PMID but for which the corresponding journal is currently indexed for MEDLINE.

Results:

189 WBPapers lacking a PMID were published in a journal indexed for MEDLINE.

 Daniel looked over this list and found: 82 papers (~43%) had a PMID; 79 papers (~42%) could still not be found in PubMed; 26    
 papers (~14%) correspond to published errata.  

645 WBPapers lacking a PMID also lack a Journal entry.

 To address these WBPapers we need to break this list of 645 papers down by Paper Type, e.g. Review, so that we can focus our efforts 
 on tracking down journal titles only where appropriate.  Book chapters, for example, will not have a journal title.

163 (?) WBPapers are published in a journal not indexed for MEDLINE.

 The question mark after this number reflects that this number is the difference between 997 and the sum of the other two 
 categories, but I haven't verified this number any other way.

Next Steps:

For 189 papers that Daniel examined:

82 WBPapers with a PMID: add the identifier to the pap_identifier table

26 WBPapers are errata: don't add the PMID of the original article (PubMed doesn't index errata or corrections, so we don't want to give the errata a PMID. We should, however, enter the doi of the errata (see below) wherever we can.)

2) When papers have not been indexed by PubMed, add that information to postgres along with details of indexing; display on paper editor:

I'd like to be able to record in postgres and display in the paper editor the fact that someone has checked a WBPaper for a PMID and can't find it. That way, we'll know we looked and won't keep wondering why a paper doesn't have a PMID. It is possible that MEDLINE will index a paper long after it's come out; we could decide how often it makes sense to re-check PubMed or if in any of these cases it makes sense to contact PubMed and ask them to index a paper.

Here's an idea for recording the outcome of 'not found' in postgres:

Create a new paper table, pap_pmid_check

Populate that table with the WBPaper ID, the result of the search ('not found'), and the Coverage information from the corresponding journal's XML file that pertains specifically to PubMed indexing:

<IndexingSourceList> <IndexingSource>

           <IndexingSourceName IndexingTreatment="Full" IndexingStatus="Currently-indexed">PubMed</IndexingSourceName>
           <Coverage>v1n1, July 2005-</Coverage>
       </IndexingSource>
   </IndexingSourceList>

So in this example, the table would hold the WBPaper ID, the text string 'not found', and Coverage v1n1, July 2005-

The reason I'd like to store the coverage information is that is helps to indicate why a particular article from a journal may not have a PMID, e.g. if PubMed indexing for that journal happened after this article was published.

We could display this information (not found, Coverage v1n1, July 2005-) in the paper editor in a field named pmid_check below the identifier field. We wouldn't have to dump this in the .ace file. The field could be editable by a curator, though.

3) When no PMID is found, check and correct bibliographic errors in WBPaper objects:

If a paper is published in a journal that *is* currently indexed for PubMed, then there are several possibilities I can think of:

a) as described above, the paper was published before the journal starting being indexed for PubMed and they haven't retroactively indexed the older articles

b) the journal is normally indexed, but the particular article we have is of a type not typically indexed (see WBPaper00013369 and its accompanying PMID for an example)

c) there's something incorrect about our WBPaper entry - see WBPaper00000816 for an example. WBPaper00000816 is actually attributed to the journal Science, but if you look at the pdf, the article actually looks to be from a periodical like Discover and is written by a science journalist (whose name we don't have correct!)




Back to Paper Pipeline