Tutorial: Is this an orphan enzyme?

You have an enzyme name but no sequence data. Do you truly have an orphan enzyme?

We define an orphan enzyme as an enzyme that has been experimentally characterized but which has no sequence data in any major sequence resources. So to see if an enzyme really is an orphan, we naturally need to check those sequence resources.

Here is a stepwise guide to validating whether you truly have an orphan enzyme.

(1) Collect as many names for the enzyme as you can

Compile a list of names for the enzyme. You will probably find these in the papers or patent literature you have read about the enzyme. It is typical for an enzyme to have at least both a common name and a systematic name. For example:

    • malate oxidase
    • (S)-malate:oxygen oxidoreductase

You may encounter other names for the enzyme as you check it against databases, so you may need to “return” to a database you’ve already checked to query those newfound names.

(2) Use these names to search major sequence databases

Now that you have a collection of names, you can use them to search other major sequence databases. Our list of major sequence databases includes:

Data exchange between NCBI’s GenBank and DDBJ and ENA means that you only need to check NCBI to comprehensively check DDBJ and ENA as well.

We recommend looking first at the the ExplorEnz and UniProt databases.

ExplorEnz only contains those enzymes that have already been assigned EC numbers, but tends to be the best source for all of an enzyme’s names and synonyms. ExplorEnz does not contain sequence data itself, but links out to entries in other databases (that’s why it’s not on our “major sequence databases” list).

Tip: If you are having trouble finding your enzyme during database searches, try searching with just part of the name. This can help you find the enzyme when it has been entered under a slightly different name, which is more common than you might think.

Searching ExplorEnz

Just type the enzyme name you know into the “Search by name” interface at ExplorEnz:

Tutorial-Validate-01
 

If we search for an enzyme that is in the EC system, ExplorEnz will show us its full EC information page:

Tutorial-Validate-02
 

For your later searches, you will want to collect all of the names in the “Accepted name,” “Other name(s),” and “Systematic name” fields.

Searching UniProt

Type the enzyme name into the “Query” window at the top of the UniProt front page, with “Search in Protein Knowledgebase (UniProtKB)” selected.

Tutorial-Validate-03
 

Since UniProt collects data on enzymes across many organisms, a simple search for an enzyme such as “malate oxidase” can yield many results:

Tutorial-Validate-04
 

As this example shows, UniProt may not show our enzyme by the same name we used to search for it.

UniProt also gives you many options to narrow your results. The best way to find entries that list other names for the enzyme is to choose to “Show only reviewed” entries. That cuts the list down to just ten entries:

Tutorial-Validate-05
 

It can also be helpful to restrict your search to entries that include a key term (such as “malate”) in the protein name. Doing this with our example reduces us to just two entries:

Tutorial-Validate-06
 

That’s a lot less overwhelming.

After this, click through to the individual enzyme pages, which look like this:

Tutorial-Validate-07
 

UniProt makes it convenient for us by listing all the protein names in the “Protein names” field at the top of the protein’s page.

UniProt collects sequence data for its entries in the “Sequences” field, which is farther down the page:

Tutorial-Validate-08
 

Searching BRENDA

BRENDA, like ExplorEnz, only collects data on EC enzymes.

You can type your enzyme name into the “Enzyme Name” search field at the top of the BRENDA home page.

BRENDA’s search results will show you all matching enzymes. Conveniently, they also tell you right away if your enzyme has sequence information:

Tutorial-Validate-09
 

There is an “AA Sequence” link on the left side of each enzyme’s page that will generate a pop-up window with sequence accession IDs and links to the sequence in UniProt, when available.

Searching NCBI

NCBI’s Protein database is the most comprehensive place to hunt for enzyme sequences.

Just type your enzyme’s name into the search box at the top of the page:

Tutorial-Validate-10
 

If you get results, you probably don’t have an orphan enzyme. However, keep in mind that some of the entries in NCBI Protein are sequence fragments. For example, one of the older glucose oxidase entries looks like this:

Tutorial-Validate-11
 

If that were the only sequence data we could find, we would still consider the enzyme an orphan.

(3) Orphan or not?

If you found a sequence at any point in this search process, congratulations! Your enzyme is not an orphan.

If you did not find a sequence, then you have a putative orphan enzyme. There may still be sequence data in the literature, patents, or less typical databases. Take a look at our other tutorials to learn how to carry out a sequence search for your putative orphan enzyme.