Junk no more (2024)

R.I.P., junk DNA: not the DNA as such, but the moniker that has described it in a misleading fashion for years. Scientists have long known that vast swatches of the human genome don’t produce proteins. They have also known that these sections are nonetheless active. How much of the genome produces proteins was not known until the first draft of the Human Genome Project, released in 2000, tallied the coding regions of the genome. Only about 1 percent—roughly 21,000 genes—codes for proteins. And the other 99 percent?

The National Human Genome Research Institute (NHGRI) began a follow-up to the Human Genome Project in 2003. With a budget of $288 million, the Encyclopedia of DNA Elements (ENCODE) would map that 99 percent and catalogue its functional elements for a better understanding of the genome and its role in human biology and disease. ENCODE enlisted 440 researchers at 32 institutions in the United States, the United Kingdom, Spain, Singapore, and Japan, who communicated via wikis, Google docs, and two face-to-face meetings each year. The researchers began with a pilot project that would study just 1 percent of the genome while gauging research methods and technologies. Their findings, published in Natureand Genome Research in 2007, showed that the project could identify and characterize functional elements in the genome. In the next phase, the consortium went beyond the initial 1 percent and covered the whole genome by studying 147 cell types and performing more than 1,600 experiments.

In September the findings from those experiments were published in such journals as Nature, Genome Research, and Genome Biology. This research announced all ENCODE, “gives the first holistic view of how the human genome actually does its job.”

The consortium found biological activity in 80 percent of the genome and identified about 4 million sites that play a role in regulating genes. Some noncoding sections, as had long been known, regulate genes. Some noncoding regions bind regulatory proteins, while others code for strands of RNA that regulate gene expression. Yale scientists, who played a key role in this project, also found “fossils,” genes that date to our nonhuman ancestors and may still have a function. Mark B. Gerstein, Ph.D., the Albert L. Williams Professor of Biomedical Informatics and professor of molecular biophysics and biochemistry, and computer science, led a team that unraveled the network of connections between coding and noncoding sections of the genome.

Arguably the project’s greatest achievement is the repository of new information that will give scientists a stronger grasp of human biology and disease, and pave the way for novel medical treatments. Once verified for accuracy, the data sets generated by the project are posted on the Internet, available to anyone. Even before the project’s September announcement, more than 150 scientists not connected to ENCODE had used its data in their research.

“We’ve come a long way,” said Ewan Birney, Ph.D., of the European Bioinformatics Institute (EBI) in the United Kingdom, lead analysis coordinator for ENCODE. “By carefully piecing together a simply staggering variety of data, we’ve shown that the human genome is simply alive with switches, turning our genes on and off and controlling when and where proteins are produced. ENCODE has taken our knowledge of the genome to the next level, and all of that knowledge is being shared openly.”

Big data, big questions

The day in September that the news embargo on the ENCODE project’s findings was lifted, Gerstein saw an article about the project in The New York Times on his smartphone. There was a problem. A graphic hadn’t been reproduced accurately. “I was just so panicked,” he recalled. “I was literally walking around Sterling Hall of Medicine between meetings talking with The Times on the phone.” He finally reached a graphics editor who fixed it.

An academic whose scholarly and personal interests focus on information and how we make sense of it, Gerstein had run up against the juggernaut of the 24-hour news cycle. But in the end, he helped The New York Times get it right, just as he’d played a role in helping the international consortium of ENCODE scientists interpret the vast expanse of data that they uncovered. The concept of “big data,” an amount of information so large that it challenges efforts to store and use it, is key to ENCODE and to Gerstein’s work generally. “It’s really a very transformative idea in terms of how people approach experiments and how people think about analyzing things,” he said. He likened a rich data resource to a great piece of literature, “something that’s kind of transcendent and speaks to many different people.” It can inspire and answer many questions. “I do think that particularly for genomic data sets,” he said, “the value of the data set goes beyond the initial question.”

Given the new availability of incredibly large data sets, a scientific supergroup with high levels of collegiality and collaboration was essential to the success of ENCODE. Having one group carry out the project allowed for a uniformity of method and reporting that was critical, said Michael Pazin, Ph.D., program director in functional genomics at NHGRI. Imagine the confusion caused by a map where thick blue lines sometimes represent interstate highways and other times rivers. But there is ample room for small projects to emerge based on the availability of the new resource, added Elise Feingold, Ph.D. ’86, program director in genome analysis at NHGRI. “I don’t think (consortia are) ever going to substitute for the individual researcher and these small collaborations,” she said.

Gerstein took to the collaborative process, according to Birney. “Mark likes to find a scenario where everyone gets along without compromising the science. This is not always as easy as it sounds, and takes some effort talking to people. Like all of us, Mark has some characteristic phrases, and I would always know that Mark didn’t quite agree on something when he would start, ‘Wouldn’t you say, Ewan, that ...’, and then he’d be into some point,” Birney said.

New technology paves the way

ENCODE would have been unthinkable without the technology and methodology to gather, store, and analyze enormous data sets. When Gerstein began his career things were different. He’d majored in physics and wanted to pursue a science that was driven by advances in computer technology. But there was no clear pathway to do that. He completed his doctorate at Cambridge, which is now home to the EBI. “There was no EBI,” recalled Gerstein. “There was no program in bioinformatics. I did a program in chemistry.” He wondered whether he’d stay in academia because most universities did not have even a single bioinformatics position.

In 1996, however, Donald Engelman, Ph.D., the Eugene Higgins Professor of Molecular Biophysics and Biochemistry, saw the need for computational expertise at Yale and recruited Gerstein. “I and others in the department were concerned about computation and its role in research,” remembered Engelman, who was not involved in ENCODE. “There would be an enormous explosion of information to deal with as genetic information became available and more structural information became available. Someone who can use those enormous databases is key.”

In those days, though, the tools for uncovering those data were still being discovered. When Valerie Reinke, Ph.D., associate professor of genetics, was in college, she’d often draw diagrams of cells on co*cktail napkins to illustrate points to her friends who were not science majors. “It always amazes me that there are people who don’t want to know how their bodies work,” she said. Reinke was part of the modENCODE project, which focused on functional element identification similar to ENCODE, only in such model organisms as the fruit fly Drosophila melanogaster and the roundworm Caenorhabditis elegans. Reinke specializes in roundworm, which shares many genes with humans. About a fifth of the worm’s genome codes for proteins, making it easy to identify noncoding functional elements. The tools for discovering the fine details of what was happening in those sketches she drew in college were still evolving when she was a student. By 2000, when she joined the Yale faculty, microarray technology, which allows scientists to analyze expression of multiple genes in a single experiment, was brand-new. As with personal computing, DNA sequencing technology has rapidly grown more powerful, faster, and cheaper.

In 2007, as the ENCODE pilot project was ending and the next phase was getting started, next-generation sequencing technology became available. “That was really a remarkable confluence of events that we were able to take advantage of and was really a game changer for the project,” remembered Feingold.

The evolution of the technology is making it practical to look at genetics on an individual level, said Reinke, where information could be used to formulate treatments tailored to a particular patient. “We haven’t even begun to scratch the surface,” she said. “There are so many questions.”

One thing is clear. ENCODE will have profound implications for personal genomics. Each of us gets a double set of genes, with one copy, or allele, coming from each parent. Being able to determine allele-specific expression “brought home the idea of what you might call a personal annotation,” Gerstein said. “We think that this personal annotation is the next phase for genomics.”

This personal annotation, notes Gerstein, can raise ethical issues. Would you want an employer or health insurer to know about your susceptibility to a degenerative illness? These kinds of questions don’t stop at the molecular level, when Foursquare lets the world know in which Starbucks you’re enjoying a coffee and a friend can share on Facebook a picture from the office holiday party that you’d rather never saw the light of day. “I do think a big aspect of information technology, both big data and computing, is this erosion of privacy,” said Gerstein.

The myth of junk DNA

Some early press coverage credited ENCODE with discovering that so-called junk DNA has a function, but that was old news. The term had been floating around since the 1990s and suggested that the bulk of noncoding DNA serves no purpose; however, articles in scholarly journals had reported for decades that DNA in these “junk” regions does play a regulatory role. In a 2007 issue of Genome Research, Gerstein had suggested that the ENCODE project might prompt a new definition of what a gene is, based on “the discrepancy between our previous protein-centric view of the gene and one that is revealed by the extensive transcriptional activity of the genome.” Researchers had known for some time that the noncoding regions are alive with activity. ENCODE demonstrated just how much action there is and defined what is happening in 80 percent of the genome. That is not to say that 80 percent was found to have a regulatory function, only that some biochemical activity is going on. The space between genes was also found to contain sites where DNA transcription into RNA begins and areas that encode RNA transcripts that might have regulatory roles even though they are not translated into proteins.

But helping people grasp the massive import of ENCODE proved a challenge. “People don’t think that creating a resource is a sexy endeavor,” said Feingold.

“It’s so easy to either overpromise or undersell,” agreed Pazin. On the one hand, he did not want to make claims that ENCODE would quickly lead to cures for diseases like cancer. On the other, he didn’t want the public to ignore the discovery because it was too technical to understand. That’s why the “useful shorthand” of junk DNA so often came up in coverage, said Feingold.

Hopefully, ENCODE will help put an end to the notion of junk DNA. The project not only assigned general classes of functions to areas of the genome but also showed the complexity of how those areas interact. The project revealed the genome’s organizational hierarchy, with top-level regulators wielding vast influence while “middle managers” often have to collaborate to regulate genes.

There was no “Eureka!” moment, said Gerstein, who called the findings “the opposite of a discovery.” Instead, there were years of gathering and interrogating data to create a map of the vast majority of the genome. As many researchers have found, these noncoding regions are alive with regulatory activity that plays a critical role in human disease, though some of the functioning that was documented did not have such obvious applications.

His team, Gerstein said, took a different path from those of others involved in the project.

“Most of the project is more oriented on annotating elements,” he said. “Our unique perspective was to make it a network.”

If it were simply a genetic encyclopedia, ENCODE would catalogue its entries in isolation from one another. The Abaco Islands reside next to abacus in a conventional encyclopedia because that’s how the words fall alphabetically—not because the topics identified by the words have any intrinsic relationship. Knowing how different parts of the genome work together is far more powerful than simply compiling a parts list.

Through computational analysis, Gerstein’s lab broke apart the “hairball” of the regulatory networks to find working relationships. He developed statistical models that identified regulators located far away from the genes they influence. He found that the way the human genome is organized is not so different from the way humans organize themselves. Gerstein likens transcription factors that have considerable regulatory influence to top-level managers. As might be the case with their human analogues, these elite transcription factors tend to be conservative.

“What does conservative mean? Conservative means they’re more preserved. There’s less variation,” said Gerstein. “It’s sort of natural that in that kind of context, you don’t want them to change as much.”

The less influential transcription factors, which he terms “middle managers,” are less conservative and more likely to work cooperatively than their peers. Often these middle managers will co-regulate a gene, easing the flow of information in what would otherwise be “a bottleneck.”

There is less interaction between the top-level transcription factors and the bottom-level, least influential transcription factors than one would expect to happen by chance. The human genome is not egalitarian.

Gerstein and colleagues at the Sanger Center, the University of California at Santa Cruz, and Cold Spring Harbor Laboratory on Long Island also found about 12,000 pseudogenes—fossil genes dating back to our nonhuman ancestors—which at first glance appear to be dead. But it turns out that some pseudogenes, while they no longer code for proteins, are quite animated. “They’re very much on the edge between living and dead,” said Gerstein.

In some people, these pseudogenes are turned into actual genes. “What’s going on here? Is this a gene that’s being born?” he asked. Pseudogenes open a window on the history of our species. Some of these fossil genes may still be players in the regulatory network.

The impact on medicine

What will ENCODE mean for human health, and how soon will this genomic encyclopedia inform treatment? There is no easy answer to that critical question, according to Sherman Weissman, M.D., Sterling Professor of Genetics.

“I grew up with the field,” said Weissman, whose research interests include genome-wide mapping of gene activity and chromosome structure in humans. Weissman contributed to ENCODE through collaborations with former Yale professor Michael Snyder, Ph.D., a leader in the field of functional genomics who is now at Stanford.

Knowing the molecular basis of a disease carries no guarantee that a cure is imminent. Linus Pauling, Ph.D., linked sickle cell disease to an abnormal protein in 1949, making it the first genetic disease for which the molecular basis was known. But, Weissman noted, there is still no cure for it. On the other hand, survival rates for chronic myelogenous leukemia are improving thanks to Gleevec, a drug based on oncogene study that received FDA approval in 2001. Weissman is optimistic that genetic information could lead to effective treatments for Alzheimer disease, which he terms “simpler than cancer.”

“We have so much data, and a very large part of it hasn’t been fully exploited,” he said. “We’re really bumping up against the ceiling in some practical ways.”

One of the project’s findings is that genetic changes linked to disease occur between genes in places where ENCODE has identified regulatory sites. It’s still not clear how variations in these areas contribute to disease. “Some people were surprised,” said Pazin, “that disease-linked genetic variants are not usually in protein-coding regions. We expect to find that many genetic changes causing a disorder are within regulatory regions, or switches, that affect how much protein is produced or when the protein is produced, rather than affecting the structure of the protein itself. The medical condition will occur because the gene is aberrantly turned on or turned off or abnormal amounts of the protein are made. Far from being junk DNA, this regulatory DNA clearly makes important contributions to human health and disease.”

“It’s important to realize that these findings won’t be taken forward by people like Mark or myself—rather we have to empower clinical researchers to use this data,” said Birney of the EBI. “I think ENCODE will have a big impact on medical research—in particular, genome-wide association studies have a really remarkable overlap with ENCODE data outside of protein-coding genes, and this is leading to all sorts of new hypotheses of how these diseases operate.” YM

Colleen Shaddox is a freelance writer in Hamden, Conn.

Junk no more (2024)
Top Articles
How To Get into Grad School With a Low GPA
When the holidays roll around each year, we all feel the pressure to give our children a magical, memorable Christmas.
Costco Gas Barstow
Cato's Dozen Crossword
Pixel Combat Unblocked
How to Become a Certified Nursing Assistant | CNA Careers
Leora From Real Life Cam
Walgreens On 37Th And Woodlawn
Local Body Rubs
Yahoo Sports Pga Leaderboard
Mid America Irish Dance Voy
5427 N Crooked Creek
Cars & Trucks - By Owner near Kissimmee, FL - craigslist
Mod Engine 2 Not Launching Elden Ring
Loreal Smith Sarkisian Age
Moore Township Concerns
Evil Dead Rise Showtimes Near Regal Sawgrass & Imax
How to Use a Self-Service Car Wash | YourMechanic Advice
Vinnie Politan Weight Loss - What Causes Rapid Weight Loss In Cats
Denver Ebiz Tax Center
Standard Bank Learnership Programme 2021
Sarah Colman-Livengood Park Raytown Photos
Elizabeth's Pizza Menu Walkertown
Gotham Chess Twitter
Frcc D2L Login
Courier Press Sports
Dance Monkey Roblox Id
Zack Fairhurst Snapchat
Dollar Tree Fall Coat Hanger Wreath
25X11X10 Atv Tires Tractor Supply
Joy Ride 2023 Showtimes Near Paragon Theaters - Coral Square
Z-Ticket | An active card for summer in the Tiroler Zugspitz Arena
Assad continues to prove reliability despite Cubs' loss
Pitt Football Recruiting 247
How to Search All of Craigslist From Any Device
Why Is My Lookah Dragon Egg Blinking Yellow
Pogo Energy Express Recharge
Skipthe Games.com
King Von Autopsy Results
Berks County Court Schedule
32 Movies Like Charlie and the Chocolate Factory (2005)
World of Warcraft Bringing Back Old Anniversary Rewards for the First Time in 15 Years
Hanco*ck County Mugshots Busted Newspaper
Denny's Ace Hardware Duluth Mn
MLN9658742 – Medicare Provider Enrollment
Sams La Habra Gas Price
Spn 792 Fmi 2
Onlyonerhonda Cam
Barbari – Neskorá antika
Holiday Gift Bearer In Egypt
eValuations – BlueBird Valuation
Lovein Funeral Obits
Latest Posts
Article information

Author: Stevie Stamm

Last Updated:

Views: 6499

Rating: 5 / 5 (80 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Stevie Stamm

Birthday: 1996-06-22

Address: Apt. 419 4200 Sipes Estate, East Delmerview, WY 05617

Phone: +342332224300

Job: Future Advertising Analyst

Hobby: Leather crafting, Puzzles, Leather crafting, scrapbook, Urban exploration, Cabaret, Skateboarding

Introduction: My name is Stevie Stamm, I am a colorful, sparkling, splendid, vast, open, hilarious, tender person who loves writing and wants to share my knowledge and understanding with you.