IBM Research and the Genographic Project

Ajay Royyuru talks about Research’s role in this innovative initiative

Select a topic or year


Yorktown Heights, NY, USA - 13 Apr 2005: From determining where an individual’s ancestors can first be found to mapping migration patterns of humankind, IBM and the National Geographic Society expect that there will be many discoveries (both big and small) that will take place during their five-year research partnership known as the Genographic Project.

This is the most ambitious genetic anthropology research initiative in history, with plans to gather one of the largest collections of DNA samples to map how humankind populated the planet. IBM’s role in the partnership will be to handle all aspects of storage and analysis of this complex data.

Dr. Ajay Royyuru, IBM Research’s lead scientist for the Genographic Project, recently shared his thoughts about IBM Research’s role in this exciting and innovative initiative.

What is IBM Research’s role in this project?

Dr. Royyuru: The objective is to understand human migration, the journeys our ancestors traveled over the last 50 thousand years. The only way to get to an answer to a question like that is by digging deep. We have to gather large quantities of genetic data that describes our deep ancestral history; geographic, anthropological and cultural data that describes the diverse tapestry of peoples around the world, and find the correlations between these to draw conclusions on the migratory history of humankind. Our research team will be working closely with National Geographic and scientists worldwide to analyze the data and draw those conclusions.

How did IBM Research get involved?

Dr. Royyuru: The National Geographic Society approached IBM with the idea for this project on the recommendation of Dr. Spencer Wells, a geneticist and anthropologist, who said that he needed the company’s expertise. After meeting with IBM Research, Wells was convinced that he had found the perfect partner for this enormous undertaking.

It’s certainly not every day that a scientist in a corporate research institution is asked to participate in a project that seeks to trace the migration patterns of humankind, going back tens of thousands of years. Fortunately, the company doesn’t shy away from big challenges, so we were able to pursue this remarkable opportunity.

Have you always thought of mapping the migratory history of humankind as a big challenge that you would like to attempt?

Dr. Royyuru: Trying to find these patterns of humankind has always been an attractive goal to those who do genetics. This holds true for me too, of course, as a researcher in computational biology. Finding the answers has been possible all along, but not practical because of the number of people you’d have to have to draw meaningful conclusions.

But this project goes far beyond what’s been done before. It asks the questions that literally every person on the planet wonders at one time or another: Who am I? Where am I from? I can go back six generations myself. But what is that? Two hundred years? And it’s all geographically in one place, too. But beyond that, there’s no information. We know the human species originated in Africa and spread from there sometime in the last 100,000 years or less. Who were my ancestors and how did they get to be in that part of the world?

Which aspects of this study do you consider most remarkable?

Dr. Royyuru: I must confess I’d take part in the project solely for the opportunity to understand human diversity. But I also feel that the scale and scope of it will allow researchers to learn things that we don’t already know in an area that we are eager to study – information-based medicine.

The understanding of how medicine relates to a population, why one solution works for some people and not for others, how to minimize side effects and maximize benefits, these are all very important for the future of healthcare. And to reach this understanding, you have to get to the root of what population diversity means. The data from the Genographic Project, while not having any medical content, will far exceed anything we could ever get in a medical study.

The potential benefits of a study such as this are clear. Are there any potential controversies that might arise?

Dr. Royyuru: Yes, the fact is that we are asking people to volunteer something that is very personal, sequencing regions of their genomes – this is what defines them, what’s unique to them. There is an enormous amount of sensitivity to such data, which we fully respect. Confidentiality and privacy cannot be compromised. (See sidebar for more information on the security and privacy systems that are in place.)

There also might be some controversy from those who challenge the theory of evolution itself. But I’m a scientist, I believe in what the data tell me and I go with that. I’m not here to question someone else’s beliefs. Our team is looking solely at what the scientific facts tell us.

What kind of projects have you worked on in the past that have prepared you for this one?

Dr. Royyuru: I’ve studied human biology, molecular biology, protein structures, and the application of biological data and computational techniques in areas such as HIV/AIDS and cancer. The complexity of data for this project is incredible – the genotype consisting of markers on the mitochondrial DNA and Y chromosome, the phenotype consisting of descriptors for geographic, linguistic, ethnic, and more -- so my background is very useful.

What brought you to IBM Research originally?

Dr. Royyuru: It’s the potential for this type of research that brought me here. My expertise leant itself to either being in academia or conducting research in a corporate setting. Of course, I knew about the research being done at IBM, and knew that the company allowed for fairly basic research. But the passion for basic research that I found here was thrilling and surprising. There are very few places where that exists.

What are you doing on a day-to-day basis? Is this one project all-consuming?

Dr. Royyuru: Fortunately, that passion for basic research is carrying me through the day-to-day process of organizing a project of this magnitude and still maintaining my other responsibilities. You would think it would be all-consuming, but it’s not! I’m still doing my day job -– managing the Computational Biology Center, and while that group is doing very important work and it’s a demanding job to manage it, the Genographic Project is one area for which I’m doing the research myself.

My biggest obstacle is bridging the gap between biologists and geneticists who speak one language, and the computer scientists and mathematicians who speak another. I’m also busy setting goals and determining how best to achieve them, as well as joining my team in teaching ourselves all we can about population genetics. There is plenty to learn.

As you continue to learn more and expand the IBM team, what kind of skills are you looking for in the people you bring into this project?

Dr. Royyuru: In addition to the IBMers already on the team, we are bringing in a new researcher who specializes in applied mathematics, data mining and machine learning as they apply to biological data. Also, we’ll have a student who is working on population genetics joining us this summer as an intern.

And beyond the IBM Research team, who else is involved?

Dr. Royyuru: In addition to the team working at IBM Research’s headquarters, there is a global team of field scientists who are doing sequencing for the volunteer participants in ten worldwide sites, including locations in the United States, the United Kingdom, France, South Africa, India, China, Australia, Brazil, Russia and Lebanon. The interaction between the IBM team and the global scientific team has gotten off to a really good start with monthly meetings where we discuss objectives and progress. The steps we’re taking have already been very collaborative in nature. I anticipate there will be field visits for our team, as well as having their researchers visit our labs here.

This collaboration will surely intensify as the project unfolds over the next five years. Do you have a clear vision of what you would like to achieve in that time?

Dr. Royyuru: Most definitely. When this is over, as a company and as people, we will have a much more in depth understanding of human migration. As a company, IBM will have learned to find even more ways of making information work for people – and we’ll be able to teach ourselves to do even more for our clients and for the world at large. But the reach of this project goes far beyond the corporation. What we find and what we will learn will be here forever. And the best part is that the project isn’t restricted to the lab – every individual can participate, which means that this has the potential to touch every person in the world.

How secure is it all?

With the recent, high-profile losses of personal data in the banking and information-collection industries, security and privacy will be at the top of everyone's mind when reading about the Genographic Project. There's nothing more individual than someone's DNA, so how does a company make information like that secure enough?

For starters, there are the passwords and access limitations for participants. But these are only the most visible aspects of an infrastructure dedicated to protecting its contents.

According to Dave Safford, research staff member in IBM Research's security analysis group, the Linux operating system, Apache Web server, MySQL database manager, and PHP programming language, which make up the LAMP framework, are an element of protection that are the "single most common Web server stack. This is industrial-strength protection and very good." LAMP is the fundamental way to develop Web sites using open source technologies and is considered more secure to use because its elements have been widely tested in production use.

The database system used for the project also includes a network dispatcher and backend database server that uses WebSphere Application Server (WAS), which is undergoing Common Criteria evaluation by IBM's Software Group and IBM Research. Larry Koved, a research staff member for Web services security, and his team have done in depth research into how to test security tools for enterprise-level functions, and are deriving value from their findings by applying their discoveries to important work like this review. Elaine Palmer, whose work focuses on smart cards and Common Criteria, as well as other IBM researchers, are also involved in the evaluation.

The security and privacy gurus at IBM Research have also scanned the system for small mistakes that could have the potential to create enormous problems. "It could be something as simple as leaving something on that should not be," says Dave. "All of those things have to be checked and re-checked to make sure that it's all protected. There is a vast amount of security engineering involved in a project of this magnitude."

Then come the hackers. That is, the Global Security Analysis Lab, also known as the ethical hackers, co-founded in 1995 by Charles Palmer, department group manager for security, privacy and cryptography. This group provides technical expertise and security tools for many of IBM's products and services. As its name implies, the group focuses on the potential for intrusion and inappropriate access to confidential information, among other issues.

Related XML feeds
Topics XML feeds
Research
Chemistry, computer science, electrical engineering, materials and mathematical sciences, physics and services science