Skip to main content

AltaVista, Compaq and IBM Researchers Create World's Largest, Most Accurate Picture of the Web

`Bow Tie' Theory Shows the Web is Not as Connected as Previously Thought

SAN JOSE, PALO ALTO & SAN MATEO, Calif - 11 May 2000: -- Scientists from IBM Research, Compaq Corporate Research Laboratories and AltaVista Company have completed the first comprehensive "map" of the World Wide Web, and uncovered divisive boundaries between regions of the Internet that can make navigation difficult or, in some cases, impossible.

Previous studies, based on small samplings of the Web, suggested that there was a high degree of connectivity between sites as evidenced by recent reports on the "small world Web" and 19 degrees of separation. Contrary to those preliminary findings, the new study -- based on analysis of more than 500 million pages -- found that the World Wide Web is fundamentally divided into four large regions, each containing approximately the same number of pages. The findings further indicate that there are massive constellations of Web sites that are inaccessible by links, the most common route of travel between sites for Web surfers. Developing the "Bow Tie" Theory explained the dynamic behavior of the Web, and yielded insights into the complex organization of the Web.

These discoveries will help computer scientists better understand the structure of the Internet, and lead to new technologies and design advances that will speed and simplify e-business.

"Bow Tie" Theory Explains the Four Regions of the Web

The image of the Web that emerged through the research was that of a bow tie. Four distinct regions make up approximately 90% of the Web (the bow tie), with approximately 10% of the Web completely disconnected from the entire bow tie.

The "strongly-connected core" (the knot of the bow tie) contains about one-third of all Web sites. Web surfers can easily travel between these sites via hyperlinks; this large "connected core" is at the heart of the Web.

One side of the bow contains "origination" pages, constituting almost one-quarter of the Web. "Origination" pages are pages that allow users to eventually reach the connected core, but cannot be reached from it. The other side of the bow contains "termination" page, constituting approximately almost one-quarter of the Web. "Termination" pages can be accessed from connected core, but do not link back to it. The fourth and final region contains "disconnected" pages, constituting approximately one fifth of the Web. Disconnected pages can be connected to origination and/or termination pages but are not accessible to or from the connected core.

Impact of the Study

With the Bow Tie Theory, and its new explanation of the structure of Internet, the scientific and business communities will now be able to:

-- Design more effective Web crawling strategies. Crawling then indexing is the fundamental method employed by search engines to organize the Internet. To achieve more complete coverage, AltaVista and other search engines will be able to develop more advanced crawl strategies to capture more of the Web

-- Increase the effectiveness of e-commerce. Through the design of more effective browsing, advertising, measuring and modeling, e-commerce sites may decide to use different strategies for attracting surfers from various regions. For example, an "origination site" will have to increase its efforts to be easily found by Web crawlers. Once the site is linked to the connected core, its strategy may then shift to other traffic-generating measures

-- Analyze the behavior of Web algorithms that make use of link information. Because many search engines use link information in ranking algorithms, they become targets for link "spamming" intended to create an artificial increase in a site's linkage.

-- Predict and capitalize upon the continued evolution of the Web. The researchers believe that the Bow Tie structure will be maintained as the Web grows. While some pages may evolve into the connected core, new pages will continue to be created in all three other regions

-- Create mathematical models for the Web. With these findings, researchers can now develop new models to study the growth of the Web and possibly predict the emergence of new, yet unexplored phenomena on the Web.

This study -- the largest ever to be conducted on the topography of the Web -- is part of an ongoing, collaborative project by AltaVista, Compaq and IBM. The researchers expect to update the study on a regular basis from collected data using AltaVista's search engine and advanced connectivity server software with Compaq AlphaServer system containing 16 gigabytes of RAM, enough to hold the entire Web map in memory. IBM Research analyzed the data and contributed to the development of the "Bow Tie" Theory.

The initial findings will be presented simultaneously at the 9th International World Wide Web Conference, Amsterdam (May 15-19) and at the ACM PODS 2000 Conference, Dallas (May 14-19). Visit the following link to retrieve the "Web Map/Bow Tie Theory" conference paper (posted after May 14):
http://www9.org/w9cdrom/160/160.html (members of the press community can request an advance copy of the conference paper by contacting the press contacts at the companies).

AltaVista Company
AltaVista Company is the premier knowledge resource on the Internet at http://www.altavista.com. Building on its strong search heritage and patented technology, AltaVista unlocks the vast Internet to provide the richest, most relevant information access across multiple dimensions, including: Web pages, shopping, up-to-the-minute news, live audio and video, and community resources. AltaVista offers informative services including the multi-dimensional AltaVista Search, pure Web page Raging Search (www.raging.com) from AltaVista, AltaVista Shopping.com, AltaVista Live! personalized portal, and AltaVista Free Internet Access. AltaVista is a majority-owned operating company of CMGI, Inc. (Nasdaq:CMGI), Andover, Mass. AltaVista is headquartered in Palo Alto, Calif.

Compaq Computer Corporation
Compaq Computer Corporation, a Fortune Global 100 company, is one of the largest suppliers of computing systems in the world. Compaq designs, develops, manufactures and markets hardware, software, solutions, and services, including industry-leading enterprise computing solutions, fault-tolerant business-critical solutions, and communications products, commercial desktop and portable products, and consumer PCs.

Compaq products and services are sold in more than 200 countries directly to businesses, through a network of authorized Compaq marketing partners, and directly to businesses and consumers through Compaq's e-commerce Web site at http://www.compaq.com. Compaq markets its products and services primarily to customers from the business, home, government, and education sectors. Customer support and information about Compaq and its products and services are available at http://www.compaq.com.

IBM Research
For more information on IBM Research, go to http://www.research.ibm.com.

# # #

Contact(s) information

Nam LaMore
IBM Research
(408) 927-1282
nlamore@us.ibm.com

Jim Shissler
AltaVista Company
(650) 617-3463
jim.shissler@av.com

Eileen Quinn
Compaq Computer Corporation
(408) 285-9272
eileen.quinn@compaq.com

Related XML feeds
Topics XML feeds
Research
Chemistry, computer science, electrical engineering, materials and mathematical sciences, physics and services science