Genetic Research

What was considered ‘junk
DNA’ turns out to be huge genetic
control panel

An international team of researchers has recently revealed that much of what has been called ‘junk DNA’ in the human genome is actually a massive control panel with millions of switches regulating the activity of our genes. Without these switches, genes would not work – and mutations in these regions might lead to human disease. Discovered by hundreds of scientists working on the ENCODE Project, the new information is so comprehensive and complex that it has given rise to a new publishing model in which electronic documents and datasets are interconnected.

Just as the Human Genome Project revolutionised biomedical research, ENCODE will drive new understanding and open new avenues for biomedical science. Led by the National Genome Research Institute (NHGRI) in the US and the EMBL-European Bioinformatics Institute (EMBL-EBI) in the UK, ENCODE now presents a detailed map of genome function that identifies 4 million gene ‘switches’. This essential reference will help researchers pinpoint very specific areas of research for human disease. The findings are published in 30 connected, open-access papers appearing in three science journals: Nature, Genome Biology and Genome Research.

“Our genome is simply alive with switches: millions of places that determine whether a gene is switched on or off,” says Ewan Birney of EMBL-EBI, lead analysis coordinator for ENCODE. “The Human Genome Project showed that only 2% of the genome contains genes, the instructions to make proteins. With ENCODE, we can see that around 80% of the genome is actively doing something. We found that a much bigger part of the genome – a surprising amount, in fact – is involved in controlling when and where proteins are produced, than in simply manufacturing the building blocks.”

“ENCODE data can be used by any disease researcher, whatever pathology they may be interested in,” said Ian Dunham of EMBL-EBI, who played a key role in coordinating the analysis. “In many cases you may have a good idea of which genes are involved in your disease, but you might not know which switches are involved. Sometimes these switches are very surprising, because their location might seem more logically connected to a completely different disease. ENCODE gives us a set of very valuable leads to follow to discover key mechanisms at play in health and disease. Those can be exploited to create entirely new medicines, or to repurpose existing treatments.”

“ENCODE gives us the knowledge we need to look beyond the linear structure of the genome to how the whole network is connected,” commented Dr Michael Snyder, professor and chair at Stanford University and a principal investigator on ENCODE. “We are beginning to understand the information generated in genome-wide association studies – not just where certain genes are located, but which sequences control them. Because of the complex, three-dimensional shape of our genome, those controls are sometimes far from the gene they regulate and looping around to make contact. Were it not for ENCODE, we might never have looked in those regions. This is a major step toward understanding the wiring diagram of a human being. ENCODE helps us look deeply into the regulatory circuit that tells us how all of the parts come together to make a complex being.”

Until recently, generating and storing large volumes of data has been a challenge in biomedical research. Now, with the falling cost and rising productivity of genome sequencing, the focus has shifted to analysis – making sense of the data produced in genome-wide association studies. ENCODE partners have been working systematically through the human genome, using the same computational and wet-lab methods and reagents in laboratories distributed throughout the world.

To give some sense of the scale of the project: ENCODE combined the efforts of 442 scientists in 32 labs in the UK, US, Spain, Switzerland, Singapore and Japan. They generated and analysed over 15 terabytes of raw data – all of which is now publicly available. The study used around 300 years’ worth of computer time studying 147 tissue types to determine what turns specific genes on and off, and how that ‘switch’ differs between cell types.

The articles published 5 September 2012 represent hundreds of pages of research. But the digital publishing group at Nature recognises that ‘pages’ are a thing of the past. All of the published ENCODE content, in all three journals, is connected digitally through topical ‘threads’, so that readers can follow their area of interest between papers and all the way down to the original data.

“We have now an interactive encyclopaedia that everyone can refer to, and that will make a huge difference,” said Roderic Guigo of the Centre de Regulació Genómica (CRG) in Barcelona, Spain.

The Nature ENCODE explorer

Gene transcriptions influenced by Circadian rhythm

It’s not just a few key genes and proteins that cycle on and off in humans in a 24- hour circadian pattern as the sun rises and falls. Thousands of genes in organs throughout the body show predictable daily fluctuations, and their cycles of activity are controlled in a complex variety of ways, United States-based Howard Hughes Medical Institute researchers have discovered.

At the core of the new discovery is the finding that the function of the enzyme that transcribes genes so that they can be made into proteins – RNA polymerase – varies according to the circadian cycle. The study was published online August 30, 2012, by the journal Science.

Understanding how genes are cycled on and off throughout the day is key to understanding a number of biological functions, including human sleep and metabolism, says HHMI investigator Joseph S. Takahashi of the University of Texas Southwestern Medical Center. “If you look at the targets of these circadian genes, the top category is metabolic pathways. The clock is intimately involved in controlling metabolism on a daily basis.”

“This finding gives us a new picture of the temporal dynamics of transcription,” he says. “It gives us a new and interesting way to look at circadian cycles as well as polymerases and transcription in general.”

Takahashi has been studying the circadian gene Clock and its protein product since he discovered it in the 1990s. He and others have established that CLOCK and two other proteins, BMAL1 and NPAS2, bind to genes during the day to activate them, whereas four other circadian regulators, the proteins PER1, PER2, CRY1, and CRY2, repress genes during the night.

Takahashi and his colleagues have wanted a global view of how these activators and repressors work together to maintain the body’s 24-hour rhythms. So they undertook an in-depth study of where in the genome these regulatory proteins bound to target genes in the liver cells of mice. When they conducted their search, the team was surprised to find more than 20,000 sites that one or more of the proteins bound to. At more than 1,000 of those sites, all seven proteins could bind, but many of the sites were targets for either circadian activators or repressors, not both. That was a surprise too, Takahashi says. “We naively had thought that they would all just bind to the same locations.”

To determine how binding of the circadian proteins affected gene activity, the scientists went on to test the daily patterns of expression for all genes that are active in the liver.

To begin producing a protein from an active gene, cells first transcribe the information in that gene into RNA – so the amount of RNA corresponding to a particular gene can be used to measure gene activity. Before an RNA molecule is used to produce a protein, however, the RNA molecule must undergo some processing, which can influence how much protein will be produced. As part of this RNA processing, cells must excise interrupting portions of the code, known as introns. The remaining segments, known as exons, contain the essential information for building the protein specified by the gene.

To learn more about how circadian genes are regulated, Takahashi’s team measured the presence of exon RNA and intron RNA in their cells separately.

If the gene expression cycles were being controlled entirely at the level of transcription, exon and intron RNA would always increase and decrease at the same time. But the researchers found something different. More than 2,000 genes showed daily cycles of expression at the exon level, but fewer than 1,400 genes showed circadian patterns at the intron level. Moreover, the intron RNA transcripts that cycled all peaked at the same time, whereas the exon RNA transcript peaks were scattered throughout different times of day.

“When we compared the intron- and exon-cycling gene sets, we found very little overlap,” Takahashi says. “Only about 22% of the exon-cycling genes are being regulated at the level of transcription.” For the other 78% of exon-cycling genes, the increases and decreases must be happening at a later level of regulation, rather than the initial transcription of DNA to RNA, since the intron and exon RNA transcripts don’t match up.

To delve further into how regulation is occurring in the genes that do have cycling at the transcription level, and figure out why they are all peaking at the same time, Takahashi and his colleagues tested the timing of the first step in transcription, the binding of RNA polymerase II to the genes. That binding, he discovered, was occurring much earlier in the day than the gene transcription. CLOCK and BMAL1, the activators of transcription, recruit RNA polymerase II at the beginning of the cycle, but are repressed by the presence of the inhibitor, CRY1. As a consequence, RNA polymerase is poised or paused for a few hours before it can begin transcription. Thus, the circadian-rhythm-dependent steps involve both RNA polymerase recruitment and the release from the poised state.

“What we ended up discovering was that RNA polymerase II initiation is circadian on a genome-wide level,” says Takahashi. “Along with the global regulation of RNA polymerase II and transcription, we also found a global regulation of chromatin state by the circadian clock. Histone proteins that are critical for maintaining the integrity of DNA were also modified extensively on a circadian basis across the genome.” This suggests that virtually every gene has the potential to be modulated along with the circadian cycle, he says. The next step, he adds, is to figure out how RNA polymerase is controlled on a daily basis and what makes the polymerase pause on some genes at certain times of the day. And, of course, the question of how other RNA molecules are being regulated after transcription still remains.

 Date of upload: 20th Nov 2012


                                               Copyright © 2012 All Rights Reserved.