Removing Dead Genes from Papers
Removing Dead Genes: Transferring Data from Old to New Postgres Tables
When populating the new paper tables in postgres, we realized that if a gene paper connection had different evidences and one evidence was valid while another was invalid, the valid gene-paper connection was transferred even if the invalid connection was actually correct.
This resulted in a resurfacing of dead gene-paper connections for the WS213 build.
To fix this, we:
1) Created a new postgres table that contains information from WB on dead genes, i.e. the status of the gene and the action performed, e.g. mergrd or just dead.
2) For simple merges, we replaced the dead gene-paper connection with a corresponding live gene-paper connection, using the WBGene ID of the gene that acquired the merge.
3) For dead genes with no merge, we simply removed the gene-paper connection.
4) One more complicated gene model change that involved mergers and splits was handled manually before transferring data to the new tables.
Removing Dead Genes Going Forward: Using the New Paper Editor and the Dumping Script
Going forward, removing dead gene-paper connections will be handled manually.
There is a button on the new paper editor 'Find Dead Genes' that allows curators to see the list of dead genes connected to papers and fix these associations accordingly in postgres.
There will also be a functionality in the dumping script that checks for any dead gene-paper connections and comments out the dead gene IDs.
The check in the dumping script is mainly a redundant mechanism to remove dead gene-paper connections from WB if, for some reason, they haven't yet been dealt with in postgres.