Discovering the roots of disease

The completed HapMap Phase 1 promises to open many new doors in medical research. Debbie Forrester reports.

Scientists are several steps closer to rapid identification of genetic pre-disposition to a range of diseases and possible pharmaceutical treatments following the landmark publication of a comprehensive catalogue of human genetic variation.

After a three-year project The International HapMap Consortium, comprising more than 200 researchers from Canada, China, Japan, Nigeria, the United Kingdom and the United States have released the results of a public-private effort to chart patterns of common genetic variation in humans.

The Phase 1 HapMap – or human haplotype map – comprises more than one million genetic variation markers called single nucleotide polymorphisms (SNPs). The consortium is also nearing completion of the Phase II HapMap that will contain almost three times more markers than the first version and will enable researchers to focus their gene searches even more precisely on specific regions of the genome.

“The HapMap promises to accelerate medical research around the globe in many different ways,” said Yusuke Nakamura, MD, PhD, director of the University of Tokyo's Human Genome Centre, as well as leader of the RIKEN SNP Centre and the Japanese group working on the HapMap. “Not only will it lead to the identification of genes related to disease, it should help to pinpoint genes that influence how individuals react to various medications – discoveries that could improve drug design and lead to the development of diagnostic tools aimed at preventing adverse drug reactions.”

The International HapMap Project builds on the freely available sequence of the human genome produced by the International Human Genome Sequencing Consortium. Although research shows that any two people are 99.9 % identical at a genetic level, the 0.1 % difference is important because it helps explain why one person is more susceptible to a specific disease – say diabetes – than someone who is less susceptible. By studying the patterns of these genetic differences, or genetic variation, in many people researchers expect to identify which differences are related to disease.

“The goal of studying the human genome has always been to provide health benefits to all humankind. This project should be seen in that grand tradition,” said Francis Collins, MD, PhD, director of the National Human Genome Research Institute, which is part of National Institutes of Health (NIH), US Department of Health and Human Services. “The HapMap will provide a powerful tool to help us take the next quantum leap toward understanding the fundamental contribution that genes make to common illnesses like cancer, diabetes and mental illness.”

Single nucleotide polymorphisms

Genetic information is physically inscribed in a linear molecule called deoxyribonucleic acid (DNA). DNA is composed of four chemicals, called bases, which are represented by the four letters of the genetic code: A, T, C and G. The Human Genome Project determined the order, or sequence, of the three billion A's, T's, C's and G's that make up the human genome. The order of genetic letters is as important to the proper functioning of the body as the order of letters in a word is to understanding its meaning. When a letter in a word changes, the word's meaning can be lost or altered. Variation in a DNA base sequence – when one genetic letter is replaced by another – may similarly change the meaning.

More than 2.8 million examples of these substitutions of genetic letters – called single nucleotide polymorphisms or SNPs (pronounced snips) – are already known and described in a public database called dbSNP ( operated by NIH. The major source of this public SNP catalogue was work done by The SNP Consortium (TSC), a collaborative genomics effort of major pharmaceutical companies, the Wellcome Trust and academic centres. The human genome is thought to contain at least 10 million SNPs, about one in every 300 bases. Theoretically, researchers could hunt for genes using a map listing all 10 million SNPs, but there are major practical drawbacks to that approach.

Instead, the HapMap shows the chunks (haplotypes) into which the genome is organised, each of which may contain dozens of SNPs. Researchers only need to detect a few tag SNPs to identify that unique chunk or block of genome and to know all of the SNPs associated with that one piece. This strategy works because haplotype blocks tend to be inherited together. SNP variants that are far from each other along the DNA molecule tend to be in different haplotype blocks and are less likely to be inherited together. Because of the block pattern of haplotypes, it will be possible to identify just a few SNP variants in each block to uniquely mark, or tag, that haplotype.

As a result, researchers will need to study only about 300,000 to 600,000 tag SNPs out of the 10 million SNPs that exist, to efficiently identify the haplotypes in the human genome. It is the haplotype blocks, and the tag SNPs that identify them, that will form the HapMap. Gene hunters around the world have been quick to recognise the potential of the HapMap, tapping into its publicly available SNP datasets even before the first draft of the map was completed.


In studies published in the March 2005 edition of Science, scientists used HapMap data to uncover a genetic variation that substantially increases the risk of age-related macular degeneration, the leading cause of severe vision loss in the elderly. The discovery of this single spelling variant out of the three-billion letter DNA instruction book for humans, which affects a gene that codes for a protein involved in inflammation, points the way for development of better diagnostic tests and treatments for this debilitating disease.

Many other discoveries lie on the horizon as the HapMap empowers studies of other common diseases, including diabetes, Alzheimer's disease, cancer, schizophrenia, asthma, hypertension and heart disease. In fact, more than 70 papers and presentations related to the HapMap were on the programme for the autumn meeting of the American Society of Human Genetics in Salt Lake City, US.

In addition to assisting in the identification of genetic factors involved in disease, the HapMap can help to pinpoint genetic variations that may affect the response of people to medications, toxic substances and environmental factors. Such information can be used to help doctors prescribe the right drug in the right dose for each patient, as well as recommend prevention strategies that take into account individuals' varying responses to environmental factors, such as diet. Also, the HapMap may be used to find genetic factors that contribute to good health, such as those protecting against infectious diseases or promoting longevity.

However, the consortium members caution the research community not to jump to conclusions too quickly when using HapMap data to facilitate their genome-wide searches for genes associated with human health and disease. “Rigorous standards of statistical significance will be needed to avoid a flood of false positive results,” they caution. To avert such problems, they urge their scientific colleagues to confirm any gene “discovery” by replicating the findings in independent studies that use the same set of SNP markers in different groups of people with the same disease or condition.

Researchers produced the HapMap using DNA from blood samples collected from 269 volunteers from widely distributed geographic regions. Specifically, the samples came from Yoruba in Ibadan, Nigeria; Japanese in Tokyo, Han Chinese in Beijing and Utah residents with ancestry from northern and western Europe. No medical or personal identifying information was obtained from the donors apart from the population from which they were collected. “Following the precedent set by the Human Genome Project, we have weighed the ethical, legal and social implications of this research from the outset,” said Bartha M Knoppers, JD, PhD, of the University of Montreal, Canada.

“For example, we developed a very careful community engagement and sampling strategy to ensure that participants from all the different population groups could give full informed consent. Still, we know our job is far from over and we stand ready to address whatever ethical, legal and social issues may arise in the future.” In addition to its intended function as a resource for studies of human health and disease, the Phase I HapMap has yielded fascinating clues into how our species evolved over time and specific forces that were important as the human population spread around the globe.

Genetic diversity in humans is increased by recombination, which is the swapping of DNA from the maternal and paternal lines. It has been recently realised that in humans such swapping occurs primarily at a limited number of “hotspots” in the genome. By analysing the HapMap data, the researchers have produced a genome-wide inventory of where recombination takes place.

This will enable more detailed studies of this fundamental property of inheritance, as well as serve to improve the design of genetic studies of disease. The HapMap consortium found that genes involved in immune response and neurological processes are more diverse than those for DNA repair, DNA packaging and cell division.

Researchers speculate the difference might be explained by natural selection shaping the human population in ways that favour increased diversity for genes that influence the body’s interactions with the environment, such as those involved in immune response and that do not favour changes in genes involved in core cellular processes. As expected, the vast majority of both rare and common patterns of genetic variation were found in all of the populations studied.

However, the consortium did find evidence that a very small subset of human genetic variation may be related to geographic or environmental factors, such as micro-organisms that cause infectious diseases.

This evidence appears as significant differences in genetic variation patterns in particular genomic regions among the populations studied. While more follow-up study is needed to explore the differences, researchers say some of the most striking examples merely serve to confirm wellknown genetic differences among populations, such as the Duffy blood group, which plays a role in response to malaria, and the lactase gene, which influences the ability to digest milk products.

All in all, across the one million SNPs surveyed, researchers found only five exclusive, or “fixed”, differences on a human’s 22 pairs of non-sex (autosomal) chromosomes between the Yoruba samples and the Japanese and Han Chinese samples; 11 between the Yoruba samples and the samples from Utah residents of northern and western European ancestry; and 21 between the Utah samples and the Japanese and Han Chinese samples. Phase II of the HapMap, for which the data has already been generated and analysis is underway, will be an even more powerful tool than Phase I.

Taking advantage of the high-throughput genotyping capacity of Perlegen Sciences, of Mountain View, California, Phase II is adding 2.6 million SNPs to the HapMap and then testing virtually the entire known catalogue of human variation on the HapMap samples. “The Phase II map will make it even easier for researchers to correlate genetic variation with gene function which is crucial for developing therapies tailored to each person’s genetic make-up,” said Kelly A Frazer, PhD, vice president of genomics at Perlegen.

- The International HapMap Consortium is a $138million public-private partnership of scientists and funding agencies from Canada, China, Japan, Nigeria, the United Kingdom and the United States.

- As was the case with all of the data generated by the Human Genome Project, HapMap data are being made swiftly and freely available in public databases.

Researchers can access this data through the HapMap Data Co-ordination Centre (, the NIH-funded National Centre for Biotechnology Information's dbSNP ( ndex.html) and the JSNP Database in Japan (

                                             Copyright © 2006 All Rights Reserved.