The World Wide Web or WWW is the name given in 1990 by Tim Berners-Lee of CERN to his proposal for an Internet-based hypertext system. This would link together behind a single, easy-to-use interface the various information resources spread around the Internet, and accessed using many different systems and protocols.
The invention that weaves everything into a single Web in the WWW was devised by Ted Nelson in 1965: hypertext. In hypertext any word can be associated with a link that points to some other piece of information. So as to be able to display hypertext, Tim Berners-Lee developed a description language called simply Hypertext Markup Language, HTML for short. This was based on SGML (Standard Generalised Markup Language), an extensive structured document-description language used in publishing. The basic idea behind HTML is to describe the structure of a document, for example, by saying which part of the text is a heading, emphasised words or a quotation, and allowing the way these are finally displayed to depend on the user's program and display device. This means that a sight-impaired person can listen to an HTML document on a terminal fitted with a speech-synthesiser, while someone with a graphic user interface can view the same document as text adjusted to fit his or her screen.
So that HTML can be used to create a worldwide hypertext web we, of course, need an unambiguous way of defining links to other documents or other Internet resources. This is done using a Universal Resource Locator, or URL, which is made up of three parts. The first part defines the data transfer method or protocol, such as:
- ftp: File Transfer Protocol, for transferring programs, texts and
images. Public files are available, for example, from the NIC.FUNET.FI
computer, to users identifying themselves as 'anonymous'. FTP was the
Internet's first data-transfer application.
- telnet: TELNET creates remote-terminal connections to other
computers. Public databases mostly used to be accessed via
remote-terminal logins, and most of them had their own command language.
Nowadays many of them have their own HTML-based interface.
- mailto: in the ARPANET, the need for person-to-person
communications soon gave rise to electronic mail. This was a separate
function from data transfer, and in today's Internet mail is relayed
using SMTP (Simple Mail Transfer Protocol). E-mail used to be one of the
rare services that worked in almost every computer network, using
gateways and complicated addresses, which users had to copy in by hand,
and frequently also modify in their heads to suit their own system. E-
mail was also used a lot for remote access to various services.
- http: Hypertext Transport Protocol is a fast, simple connection
method developed specifically for the WWW.
The second part of a URL generally gives the address of the computer where the desired service is, and may also include qualifiers such as a user ID or gateway number. This makes it possible for browsers to work out with which site they are supposed to set up a TCP/IP connection. Examples:
- open a connection with the FTP service on the FTP.FUNET.FI computer.
- Open a terminal connection to the Finnish National Bibliography
- Start up the e-mail function to send a message to the address
- Establish a connection with the CSC's homepage
- And finally the last part of the URL contains the internal reference on the server in question. This reference can be a file pathname or even database-search parameters. For example, http://www.csc.fi/suomi/funet/funet-esittely.html finds the www.csc.fi computer, goes to the directory /suomi/funet/ and opens the file there called funet-esittely.html
- The keyword 'FUNET' entered into the NWI.FUNET.FI search engine is automatically converted into a URL like the following one, containing the database search parameters:
- The end section of this, starting from where it says lang=fi, consists of parameters to be entered into the program nwisearch.tcl for it to carry out a search. Many big WWW websites are nowadays also constructed individually for each user using a database.
- In the future, the aim is to give network resources URI identifiers, which correspond to the ISBN numbers used for printed publications. These will make it possible to find a resource even if it has changed server or been copied to various parts of the network.