One of the numerous surprises to stem from sequencing the human genome was the discovery that protein-coding series make up a fairly little proportion of our DNA. The Scientist spoke with Kári Stefánsson, founder of deCODE Genetics, which sequenced half of the genomes analyzed in the study, about the value of whole genome sequencing. In this paper, we also … listed about 12 phenotypes where we found variants in the genome that associate with them, where we could not discover the same by utilizing entire exome sequencing. The exons are only an extremely small part of a genome, and the rest of the genome is not ineffective. The reason we did that is that when you do entire genome sequencing for diagnostic functions, it is extremely important to have a recommendation that you can go to, to make sure if you are sequencing somebody with a specific illness and you discover a rare variant … that the variant that you discover in the regrettable kid isnt discovered in a lot of healthy individuals.
Among the lots of surprises to stem from sequencing the human genome was the revelation that protein-coding series comprise a relatively small proportion of our DNA. These exons, jointly understood as the exome, account for less than 2 percent of the human genome. Still, scientists frequently search through exomes for the hereditary basis of diseases– and such searches have shown fruitful, identifying the culprits behind unusual illness and pathological genetic changes in growths. But scientists are progressively recognizing that whole-exome sequencing informs only part of the story: Mutations in noncoding areas of the genome can also trigger illness– for example, by affecting the transcription of a gene. Kári Stefánsson © COURTESY OF DAVID SLIPHERTo begin to discover a few of these ignored results, researchers just recently analyzed the entire genome series of more than 150,000 people from the UK Biobank, a massive database which contains DNA samples and phenotypic information from 500,000 people. Their findings, released July 20 in Nature, include 12 genetic versions not identified in whole exome sequencing that influence traits such as height and age of the onset of menarche. The Scientist spoke to Kári Stefánsson, founder of deCODE Genetics, which sequenced half of the genomes analyzed in the research study, about the value of entire genome sequencing. (Amgen, deCODEs parent business, was one of 4 companies that added to the research studys funding; the other half of the sequencing was performed by the Wellcome Sanger Institute.)The Scientist: What is the UK Biobank, and what is its whole genome sequencing consortium attempting to achieve?Kári Stefánsson: What we are always striving to carry out in population studies like this is to establish understanding of human variety. The variety in risk of illness, response to treatment, diversity when it concerns academic attainment, socioeconomic status, et cetera. Individuals have been discussing whether to utilize whole exome sequencing or whole genome sequencing, and which among these two yields the most useful data. We began to look at regions that … have considerable series preservation when we look at these 150,000 genomes. The presumption is that the regions that are least tolerant of sequence variety are the regions that must be of greatest functional significance. And when we take a look at the 1 percent of the genome that is least tolerant of series variety … 83 percent of them are in the intragenic sequences, not in the exons. It is definitely clear that there is huge information to be mined out [ of] those areas. The exons are only an extremely small part of a genome, and the rest of the genome is not useless. In this paper, we likewise … noted about 12 phenotypes where we found variants in the genome that relate to them, where we might not discover the very same by utilizing entire exome sequencing. It is definitely clear … that whole exome sequencing was incredibly valuable, offered us an amazing insight into the function of coding sequences in the pathogenesis of all sort of illness, however that whole exome sequencing does not be adequate. TS: So entire genome sequencing was attempted since whole exome sequencing has not captured the entire picture?KS: Evolution is absolutely ruthless and sheds everything that we dont need. The exons are just a very little part of a genome, and the rest of the genome is not ineffective. Its absolutely clear that the remainder of the genome is functionally extremely crucial and therefore does not permit boundless sequence diversity.See “Adapting with a Little Help from Jumping Genes”TS: What were the technical challenges in doing entire genome sequencing on this huge scale?KS: There are all kinds of obstacles, however we are relatively used to scaling up and taking processes that are generally done on a fairly little scale and do them on a big scale … Certainly theres an enormous quantity of data that comes out of 150,000 genomes. There is a challenge, for example, in joint variant calling [the procedure to identify genetic variants from sequence data], when youre calling the variations in all of these genomes all at once. Theres a challenge when it concerns just managing and scoring and mining these data. This is ending up being, most importantly, an informatics challenge.TS: What are the remaining difficulties? KS: We are all of us desiring comprehend human variety. And if you take a look at the information coming out of the UK Biobank, it is not an unbiased sample of the population of Great Britain. There is an overrepresentation of people of European descent. And what we have of series diversity from individuals of African descent, of Asian descent, et cetera, is much less than we need.Its incredibly essential … from a clinical viewpoint, to get more representation of individuals of other ethnic groups. It is also, from a societal perspective, undesirable to have this little info on individuals of other descents. The health care disparity on the planet starts with the reality that we know less about the nature of illness in individuals of other origins than European … So among the difficulties is to ensure that we have powerful friends of people of other descents to deal with. See “Genetic Risks for Depression Differ Between Ancestral Groups”TS: What did you gain from the entire genome sequencing published in the paper?KS: The main, crucial lesson is … how [an] extremely large percentage of the areas with high sequence conservation are outside of the exons … It means that we have a powerful task in front of us to annotate the regions with low exhaustion score or little tolerance for sequence variety. TS: And you determined numerous versions related to phenotypic diversity?KS: That is just the initial step. We listed about 12 associations, however this is sequence variety for the remainder of the world to work on, to look for connections in between variants in the sequence and phenotypes. And we just put a few examples of how we might do this with whole genome sequencing where we could not find this with the whole exome sequencing. TS: The genome series are available online, for other scientists to work on?KS: They will be available through the UK Biobank. We also placed on our website a database of allelic frequencies. The factor we did that is that when you do whole genome sequencing for diagnostic functions, it is exceptionally important to have a reference that you can go to, to make certain if you are sequencing somebody with a particular illness and you discover an unusual variation … that the variant that you discover in the unfortunate kid isnt found in a bunch of healthy people. It is a valuable resource for those who want to work on diagnostic sequencing … We felt it was our responsibility to make it available to everybody whos working on diagnostic sequencing.Editors note: This interview has actually been edited for brevity.