14 May

New paper – Finding Sequences for over 270 Orphan Enzymes

We’re pleased to announce the publication of our new paper Finding Sequences for over 270 Orphan Enzymes in PLoS ONE.

In this paper, we describe how we used a combined literature, patent, and database evaluation method to find sequences for 275 orphan enzymes. During this research, we learned quite a bit about how orphan enzymes happen. We think we’ve developed some pretty good ways of preventing future orphan enzymes, too.

Once you’ve had a chance to look at the paper, click here to learn more about how to resolve orphan enzymes and deposit enzyme sequences in our Tutorials section.

We also encourage anyone who is interested in how enzyme and sequence data is handled to take a quick survey about sequence deposition and EC numbers.

You can access the database referred to in the paper here.

05 Feb

The orphans we find have a big impact

We celebrated the end of 2013 with the release of our new paper, Rapid identification of sequences for orphan enzymes to power accurate protein annotation in PLOS ONE.

So what’s the big deal? What are orphan enzymes and why do we need to identify them?

Sequences are card catalog numbers for everything

In modern biology, protein and nucleotide sequence data are the glue that hold everything together. When we sequence a new genome, for example, we make a “best guess” for what each gene does by comparing its sequence to a vast collection of sequences we already have. Essentially, that lets us go from this amino acid sequence:

1 MSLPLKTIVH LVKPFACTAR FSARYPIHVI VVAVLLSAAA YLSVTQSYLN
51 EWKLDSNQYS TYLSIKPDEL FEKCTHYYRS PVSDTWKLLS SKEAADIYTP
101 FHYYLSTISF QSKDNSTTLP SLDDVIYSVD HTRYLLSEEP KIPTELVSEN
151 GTKWRLRNNS NFILDLHNIY RNMVKQFSNK TSEFDQFDLF IILAAYLTLF
201 YTLCCLFNDM RKIGSKFWLS FSALSNSACA LYLSLYTTHS LLKKPASLLS
251 LVIGLPFIVV IIGFKHKVRL AAFSLQKFHR ISIDKKITVS NIIYEAMFQE
301 GAYLIRDYLF YISSFIGCAI YARHLPGLVN FCILSTFMLV FDLLLSATFY
351 SAILSMKLEI NIIHRSTVIR QTLEEDGVVP TTADIIYKDE TASEPHFLRS
401 NVAIILGKAS VIGLLLLINL YVFTDKLNAT ILNTVYFDST IYSLPNFINY
451 KDIGNLSNQV IISVLPKQYY TPLKKYHQIE DSVLLIIDSV SNAIRDQFIS
501 KLLFFAFAVS ISINVYLLNA AKIHTGYMNF QPQSNKIDDL VVQQKSATIE
551 FSETRSMPAS SGLETPVTAK DIIISEEIQN NECVYALSSQ DEPIRPLSNL
601 VELMEKEQLK NMNNTEVSNL VVNGKLPLYS LEKKLEDTTR AVLVRRKALS
651 TLAESPILVS EKLPFRNYDY DRVFGACCEN VIGYMPIPVG VIGPLIIDGT
701 SYHIPMATTE GCLVASAMRG CKAINAGGGA TTVLTKDGMT RGPVVRFPTL
751 IRSGACKIWL DSEEGQNSIK KAFNSTSRFA RLQHIQTCLA GDLLFMRFRT
801 TTGDAMGMNM ISKGVEYSLK QMVEEYGWED MEVVSVSGNY CTDKKPAAIN
851 WIEGRGKSVV AEATIPGDVV KSVLKSDVSA LVELNISKNL VGSAMAGSVG
901 GFNAHAANLV TALFLALGQD PAQNVESSNC ITLMKEVDGD LRISVSMPSI
951 EVGTIGGGTV LEPQGAMLDL LGVRGPHPTE PGANARQLAR IIACAVLAGE
1001 LSLCSALAAG HLVQSHMTHN RKTNKANELP QPSNKGPPCK TSALL*

…to predicting that this protein is probably an “HMG-CoA Reductase,” an enzyme that carries out a key step in cholesterol synthesis.

We can also get more specific, tying part of this sequence information to the specific activity of the protein. In the case of my example enzyme, the “business end” is the second half of the protein.

This kind of sequence data powers so much of what we do in modern biology, from guessing what individual proteins do all the way to generating entire metabolic models and then predicting literally every food source a microbe can grow on.

We’re missing a lot of sequences

Hundreds upon hundreds, in fact. For a lot of critical enzymes.

As part of our Orphan Enzymes Project, we’ve tried to figure out how we can find sequences for these hundreds of enzymes.

After all, each enzyme represents hundreds of thousands of dollars in lost research…and each enzyme sequence we don’t have undercuts the value of all of our fantastic sequence-based tools.

We can rapidly identify a lot of orphan enzymes

Our new paper describes a few case studies on how we can identify orphan enzymes in the lab and just how big an impact identifying sequence for each orphan enzyme has.

We found several cases where we were actually able to buy samples of enzymes that had never been sequenced. We were also able to collaborate with Charles Waechter and Jeffrey Rush of the University of Kentucky to find sequence data for an enzyme they’d been working hard to characterize.

The key point of this part of our work is that many enzymes that are “tricky” for one set of researchers to sequence may be entirely doable for another group that specializes in sequencing. The more we collaborate, the more value we get out of all of our work.

Identifying orphan enzymes has a big impact

The second part of our work asks the simple question, “Does it matter?”

For each enzyme for which we found sequence data, we asked “How many enzymes should we now re-annotate?”

In other words, for all those guesses that have been made about what proteins do, for how many is our enzyme the best guess based on closeness of its sequence to the one we found.

It turns out that each enzyme sequence we identified led to anywhere from 130 to 430 proteins getting new, better guesses about their functions.

That’s hundreds of potential incorrect predictions or misled researchers averted by just “finishing the job” of sequencing a handful of enzymes.

Given the tremendous amount of work that has gone into characterizing each of these enzymes, it’s essential that we take every opportunity to apply modern sequencing expertise to existing samples.

Comments on the paper are welcome, whether here or on the paper itself at PLOS ONE.

05 Oct

Contributors info added

You can now read more about individual team members and their contributions to OEP at The Team.

The Current OEP Team:

Former OEP Team Members:

Special thanks to Kristian B. Axelsen from UniProt for his contributions in valuable review of our orphan enzymes data. We look forward to publishing the paper soon!

03 Oct

OEP is getting populated

We’ve been diligently updating the website with new content and now it’s looking like a proper website!

Please click around the various pages in the menu bar to see the changes (because we added content to everything).

19 Aug

Welcome to the new OEP website!

We’re excited to announce the birth of the new OEP website!

As research becomes more publicly available, expect updates such as a searchable database for the status of your orphan enzyme.

Meanwhile, tutorials on how to resolve orphan enzyme and find homes for them will become available as they are completed.

Thank you!

The Orphan Enzymes Project is led by Clover Collective
Supported by NIH grant GM086755