In 1970, Edgar F. Codd, an Oxford-educated mathematician working at the IBM San Jose Research Lab, published a paper showing how information stored in large databases could be accessed without knowing how the information was structured or where it resided in the database.
Until then, retrieving information required relatively sophisticated computer knowledge, or even the services of specialists who knew how to write programs to fetch specific information—a time-consuming and expensive task.
Databases that were used to retrieve the same information over and over, and in a predictable way—such as a bill of materials for manufacturing—were well established at the time. What Codd did was open the door to a new world of data independence. Users wouldn’t have to be specialists, nor would they need to know where the information was or how the computer retrieved it. They could now concentrate more on their businesses and less on their computers.
Codd called his paper, “A Relational Model of Data for Large Shared Data Banks.” Computer scientists called it a “revolutionary idea.”
Today, the ease and flexibility of relational databases have made them the predominant choice for financial records, manufacturing and logistical information, and personnel data. Most routine data transactions—accessing bank accounts, using credit cards, trading stocks, making travel reservations, buying things online—all use structures based on relational database theory.
“[Codd’s] relational model was at first very controversial; people thought that the model was too simplistic and that it could never give good performance.”
“Database Systems: A Textbook Case of Research Paying Off,” in Computer Science: Reflections on the Field, Reflections from the Field2004
“Reducing the number of times you must enter each item has the side benefit of reducing data entry errors. Each item is stored only once, so even if you do make an error, you must correct only that one entry.”
“Relational databases: The Inspiration Behind the Theory,” Tech RepublicApril 2, 2003
“Ray Boyce and I wanted to design a query language that had the expressive power of Ted Codd’s relational languages but was easier to understand by users who were not experts in set theory or formal logic.”
“[The relational model] is a nice uniform way of talking about data, allowing one to compare systems, compare algorithms, and so on. It also makes use of some elegant mathematical theory.”
Nomination of Edgar F. Codd for IBM FellowDecember 1, 1975
“Data, not programs, is the only thing that matters—applications are transient and have no value except to acquire, manipulate, and display data. Data is the only thing with value.”
“There’s Hope for us all,” Doug’s Oracle Blog, oracledoug.comMarch 20, 2008
“Codd’s biggest overall achievement was to make database management into a science. He put the field on solid scientific footing by providing a theoretical framework—the relational model—within which a variety of important problems could be attacked in a scientific manner.”
“Former IBM Fellow Edgar (Ted) Codd passed away on April 18,” IBM Research NewsApril 23, 2003
Codd’s idea spawned a new family of products for IBM, centered on the
According to the New York Times obituary for Codd, “… before Dr. Codd’s work found its way into commercial products, electronic databases were ‘completely ad hoc and higgledy-piggledy,’ said Chris Date, a relational data expert who worked on DB2 at IBM before becoming a business partner of Dr. Codd’s.”
Like many revolutionary ideas, the relational database didn’t come about easily.
By the 1960s, the vast amount of data stored in the world’s new mainframe computers—many of them IBM System/360 machines—had become a problem. Mainframe computations were expensive, often costing hundreds of US dollars per minute. A significant part of that cost was the complexity surrounding database management.
Codd, who had added a doctorate in computer science to his math background when he came to the United States from his native England, set out to solve this problem. He started with an elegantly simple premise: He wanted to be able to ask the computer for information, and then let the computer figure out where and how the information is stored and how to retrieve it.
IBM’s Don Chamberlin said that Codd’s “basic idea was that relationships between data items should be based on the item’s values, and not on separately specified linking or nesting. This notion greatly simplified the specification of queries and allowed unprecedented flexibility to exploit existing data sets in new ways.”
In his seminal paper, Codd wrote that he used the term relation in the mathematical sense of set theory, as in the relation between groups of sets. In plain terms, his relational database solution provided a level of data independence that allowed users to access information without having to master details of the physical structure of a database.
As exciting as the theory was to the technical community, it was still a theory. It needed to be thoroughly tested to see if and how it worked. For several years, IBM elected to continue promoting its established hierarchical database system, IBM IMS (Information Management System). A hierarchical system uses a tree-like structure for the data tables. While IMS can be faster than DB2 for common tasks, it may require more programming effort to design and maintain it for non-primary duties. Relational databases have proven superior in cases where the requests change frequently or require a variety of viewpoint “angles.”
IBM, Rockwell and Caterpillar developed IMS in 1966 to help track the millions of parts and materials used in NASA’s Apollo Space Program. It continues to be IBM’s premier hierarchical database management system.
In 1973, the San Jose Research Laboratory—now Almaden Research Center—began a program called System R (R for relational) to prove the relational theory with what it called “an industrial-strength implementation.” The project produced an extraordinary output of inventions that became the foundation for IBM’s success with relational databases.
Don Chamberlin and Ray Boyce invented SQL, for Structured Query Language, today the most widely used computer language for querying relational databases. Patricia Selinger developed a cost-based optimizer, which makes working with relational databases more cost-effective and efficient. And Raymond Lorie invented a compiler that saves database query plans for future use.
In 1983, IBM introduced the DB2 family of relational databases, so named because it was IBM’s second family of database management software. Today, DB2 databases handle billions of transactions every day. It is one of IBM’s most successful software products. According to Arvind Krishna, general manager of IBM Information Management, DB2 continues to be a leader in innovative relational database software.
Dr. Codd, known as “Ted” to his colleagues, was honored as an IBM Fellow in 1976, and in 1981, the Association for Computing Machinery gave him the Turing Award for contributions of major importance to the field of computing. The Turing is generally recognized as the Nobel Prize of computing.
Selected team members who contributed to this Icon of Progress:
- Ray Boyce Co-developer of SQL (Structured Query Language)
- Edgar “Ted” Codd Mathematician, IBM Fellow
- Donald Chamberlin Co-developer of SQL
- Christopher J. Date Longtime collaborator of Ted Codd
- Patricia G. Selinger Founding manager of the Database Technology Institute at IBM Almaden Research Center