005.10 How Big is the Internet? by Michael F. Schwartz The question often arises, "How big is the Internet?" To answer this question, we must first define what we wish to measure. At one time, connectivity via the IP protocol suite defined the Internet. Since a number of protocols now coexist on the Internet, some people have suggested defining the Internet instead by a common name space (perhaps the Domain Naming System or X.500). This definition is counterintuitive, since it elides differences between various types of physical connectivity. In particular, it does not distinguish the parts of the network that can support interactive applications (like remote login) from dialup-based, mail-only connections. Given the advantages of interactive connectivity and the growing popularity of IP, in this article I consider only the interconnected IP Internet. M. Lottor recently published in RFC 1296 the results of a ten year study that counted the number of hosts in domains that have IP addresses registered in the DNS (as opposed to domains that register only "mail exchange" (MX) records that allow mail to be forwarded to through an intermediary host). In the early years the data were extracted from host tables maintained by the DDN Network Information Center. Later, measurements were taken by a program that recursively descends the Domain Naming tree, retrieving information about all domains that allow "zone transfers". Many of the hosts counted by Lotor's study are hidden behind secure gateways or otherwise not directly connected to the Internet. Therefore, Lottor's study really indicates the spread of IP and the Domain Naming System at sites connected to the Internet. A more meaningful measure of Internet size is the number of domains at which common network services can be contacted, since it is through such services that a site gains the advantages of connectivity. A study that tracks changes in service-level reachability in the Internet is now underway. While the measurements will not be complete until the end of 1992, the first set of measurements that have been collected can be used to characterize the current size of the interconnected IP Internet. The final study will provide much more information than just Internet size. It will indicate relative growth rates among different countries, trends in the types of services to which sites limit access, how sites limit access to these services, and the types and geographical distribution of sites that distance themselves from the Internet. Starting with a large list of domains, my study attempts to connect to the following TCP/IP services at each domain: __________________________________________________________________ Port Number Service Port Number Service ------------------------------------------------------------------ 13 daytime 111 Sun portmap 15 netstat 513 rlogin 21 FTP 514 rsh 23 telnet 540 UUCP 25 SMTP 543 klogin 53 Domain Naming System 544 krcmd, kshell 79 finger __________________________________________________________________ This list was chosen to span a representative range of service types, each of which can be expected to be found on any machine in a site (so that probing random machines is meaningful). The one exception is the Domain Naming System, for which the machines to probe are selected from information obtained from the Domain system itself. Only TCP services are tested, since the TCP connection mechanism allows one to determine if a server is running in an application-independent fashion. From a list of approximately 12,700 Internet domains worldwide (generated from Lottor's January 1991 data plus a number of other sources), successful connections were recorded to at least one of the above services in 4,455 domains, broken down by top-level domain as follows: _________________________________________________________________ Top-level Description Number of Domains Reachable by Domain Name Measured Internet Services ------------------------------------------------------------------ edu U.S. Educational 2048 com U.S. Commercial 494 ca Canadian 299 au Australian 278 de German 174 se Swedish 167 gov U.S. Government 128 mil U.S. Military 115 jp Japanese 106 net Named by network 96 nl Dutch 84 org Non-profit 56 fr French 55 no Norwegian 55 fi Finnish 45 uk British 44 it Italian 39 dk Danish 38 at Austrian 21 nz New Zealand 21 ch Swiss 20 il Israeli 16 is Icelandic 8 es Spanish 8 kr Korean 5 be Belgian 4 gr Greek 4 za South African 4 br Brazil 3 ie Irish 3 tw Taiwanese 3 us Other U.S. 3 arpa ARPANET names 2 mx Mexican 2 sg Singapore 2 hk Honk Kong 1 in Indian 1 int International 1 pt Portuguese 1 tn Tunisian 1 ------------------------------------------------------ This list is a lower bound, since it depends on the span of the initial list of domains. Nonetheless, the measurements provide an interesting point of comparison. For example, it is clear that the number of USA sites is much larger than the number of sites in any other country in the world. In fact, there are nearly twice as many USA sites as sites in all other countries combined. However, given the rapid growth rate of IP connectivity in other countries, within one to two years there will be more sites internationally than in the USA. To help underscore the distinction between service-level connectivity and IP host count at Internet sites, it was found that 7,242 domains in Lottor's January 1991 list (out of 11,194 in that list) were not reachable by the above Internet services. The ratio of service reachable to all IP domains may continue to decrease, as security problems garner increasing concern. The results of the study will help uncover the trend here. The services reached by my measurement software were as follows: ___________________________________ Service Number of Domains telnet 4170 FTP 4027 SMTP 3952 rlogin 3811 rsh 3777 finger 3637 daytime 3492 Sun portmap 3421 UUCP 2217 Domain 1803 netstat 294 klogin 95 krcmd, kshell 93 ---------------------------- From this list it is clear that the "Big Three" applications (remote login, file transfer, and mail) are the main services in use. Interestingly, UUCP appears in more domains than DNS, even though TCP based UUCP (as opposed to dialup UUCP) is being phased out of existence, as NNTP gains popularity. The reason for this is probably two fold. First, most domains contract DNS service from other domains, to avoid the administrative effort required to run a Domain server. Second, many computers probably come with UUCP configured in by the manufacturer. For additional information and metrics, other recent work is now available. The size of the set of computer networks interconnected for at least mail or news service referred to as "The Matrix" is discussed by John Quarterman in his book and newsletters by the same name. The diameter of the interpersonal communication graph enabled by electronic mail is discussed in the paper "Discovering Shared Interests Among People Using Graph Analysis of Global Electronic Mail Traffic" prepared by Schwartz and Wood at the Univsity of Colorado Department of Computer Science. Anyone who is considering performing measurement studies of the Internet is urged to read Vint Cerf's "Guidelines for Internet Measurement Activities" in RFC 1262, Oct. 1991. * Assistant Professor, Dept of Computer Science, Univ. of Colorado Boulder, Colorado, USA