Tutorial: How do I identify an enzyme from the literature?

The vast majority of orphan enzymes are identified by tracing the enzyme through the literature and associated databases.

Once you’ve evaluated your orphan enzyme and collected its associated literature (as described in our How to evaluate an orphan enzyme tutorial) you may be able to find clues to lead to protein sequence in that literature.

The most common case is that a paper does not include a sequence or an accession number, but does include some form of gene or protein name that you can link to a sequence.

L-rhamnose-1-dehydrogenase (EC is an example of identifying an enzyme from a paper.

PubMed and database searches for this enzyme’s EC number yielded no results. However, searching PubMed for the enzyme’s name (L-rhamnose-1-dehydrogenase) yielded this paper:




In this publication, the researchers identified the enzyme as matching fragments of the protein RHA1 from Pichia stipitis.

The next step was to determine if the genome for P. stipitis had been sequenced. We learned that P. stipitis had been renamed as Scheffersomyces stipitis. The genome for S. stipitis was fully sequenced, which made it easy to check for RHA1. The search for RHA1 revealed this NCBI Protein entry:




You can often find enzyme sequences in this way, by finding gene names, or placement in operons, or other labels that can be used to match your enzyme to a sequence in NCBI Protein or another database.

Be careful when collecting ORF, gene, or protein names like this because there is a lot of duplication in gene names across organisms and enzyme functions. However, if you can uniquely match up a protein or ORF name with your source organism, you can avoid most misidentification errors.

We will put up a separate tutorial about how to rapidly review papers soon.