World Wide Web
By Wikipedia,
the free encyclopedia,
http://en.wikipedia.org/wiki/World_Wide_Web
Get the List of 5,400+ Translation Agencies Now! No Recurring Membership Fees!
The World Wide Web (commonly shortened to the
Web) is a system of interlinked hypertext
documents accessed via the Internet.
With a Web
browser, a user views Web
pages that may contain text,
images,
videos,
and other multimedia
and navigates between them using hyperlinks.
The World Wide Web was created in 1989 by British
scientist Sir
Tim Berners-Lee, working at the European
Organization for Nuclear Research (CERN) in Geneva,
Switzerland,
and released in 1992. Since then, Berners-Lee has played
an active role in guiding the development of Web standards
(such as the markup
languages in which Web pages are composed), and in recent
years has advocated his vision of a Semantic
Web.
How the Web works
Viewing a Web page on the World Wide Web normally begins either by typing the URL of the page into a Web browser, or by following a hyperlink to that page or resource. The Web browser then initiates a series of communication messages, behind the scenes, in order to fetch and display it.
First, the server-name portion of the URL is resolved into an IP address using the global, distributed Internet database known as the domain name system, or DNS. This IP address is necessary to contact and send data packets to the Web server.
The browser then requests the resource by sending an HTTP request to the Web server at that particular address. In the case of a typical Web page, the HTML text of the page is requested first and parsed immediately by the Web browser, which will then make additional requests for images and any other files that form a part of the page. Statistics measuring a website's popularity are usually based on the number of 'page views' or associated server 'hits', or file requests, which take place.
Having received the required files from the Web server, the browser then renders the page onto the screen as specified by its HTML, CSS, and other Web languages. Any images and other resources are incorporated to produce the on-screen Web page that the user sees.
Most Web pages will themselves contain hyperlinks
to other related pages and perhaps to downloads, source
documents, definitions and other Web resources. Such a collection
of useful, related resources, interconnected via hypertext
links, is what was dubbed a "web" of information. Making
it available on the Internet created what Tim
Berners-Lee first called the WorldWideWeb (a
term written in CamelCase,
subsequently discarded) in 1990.
History
The underlying ideas of the Web can be traced as far back
as 1980, when, at CERN
in Switzerland,
Sir
Tim Berners-Lee built ENQUIRE
(a reference to Enquire
Within Upon Everything, a book he recalled from
his youth). While it was rather different from the system
in use today, it contained many of the same core ideas (and
even some of the ideas of Berners-Lee's next project after
the World Wide Web, the Semantic
Web).
In March 1989, Berners-Lee wrote a proposal
which referenced ENQUIRE and described a more elaborate
information management system. With help from Robert
Cailliau, he published a more formal proposal for the
World Wide Web on November
12, 1990.
The proposal was modeled after EBT's (Electronic Book Technology,
a spin-off from the Institute for Research in Information
and Scholarship at Brown University) Dynatext SGML reader
that CERN had licensed. The Dynatext
system, however technically advanced (a key player in the
extension of SGML ISO 8879:1986 to Hypermedia within HyTime)
was considered too expensive and with an inappropriate licensing
policy for general HEP (High Energy Physics) community use:
a fee for each document and each time a document was charged.
A NeXTcube
was used by Berners-Lee as the world's first Web
server and also to write the first Web
browser, WorldWideWeb,
in 1990. By Christmas 1990, Berners-Lee had built all the
tools necessary for a working Web:
the first
Web browser (which was a Web editor as well), the first
Web server, and the first Web pages
which described the project itself.
On August
6, 1991,
he posted a short summary of the World Wide Web project
on the alt.hypertext newsgroup.
This date also marked the debut of the Web as a publicly
available service on the Internet.
The first server outside of Europe was created at SLAC
in December 1991.
The crucial underlying concept of hypertext originated with older projects from the 1960s, such as the Hypertext Editing System (HES) at Brown University--- among others Ted Nelson and Andries van Dam--- Ted Nelson's Project Xanadu and Douglas Engelbart's oN-Line System (NLS). Both Nelson and Engelbart were in turn inspired by Vannevar Bush's microfilm-based "memex," which was described in the 1945 essay "As We May Think".
Berners-Lee's breakthrough was to marry hypertext to the Internet. In his book Weaving The Web, he explains that he had repeatedly suggested that a marriage between the two technologies was possible to members of both technical communities, but when no one took up his invitation, he finally tackled the project himself. In the process, he developed a system of globally unique identifiers for resources on the Web and elsewhere: the Uniform Resource Identifier.
The World Wide Web had a number of differences from other hypertext systems that were then available. The Web required only unidirectional links rather than bidirectional ones. This made it possible for someone to link to another resource without action by the owner of that resource. It also significantly reduced the difficulty of implementing Web servers and browsers (in comparison to earlier systems), but in turn presented the chronic problem of link rot. Unlike predecessors such as HyperCard, the World Wide Web was non-proprietary, making it possible to develop servers and clients independently and to add extensions without licensing restrictions.
On April
30, 1993,
CERN
announced that the World Wide
Web would be free to anyone, with no fees due. Coming two
months after the announcement that the Gopher
protocol was no longer free to use, this produced a rapid
shift away from Gopher and towards the Web. An early popular
Web browser was ViolaWWW,
which was based upon HyperCard.
Scholars generally agree, however, that the turning
point for the World Wide Web began with the introduction
of the Mosaic
Web browser in 1993, a graphical
browser developed by a team at the National
Center for Supercomputing Applications at the University
of Illinois at Urbana-Champaign (NCSA-UIUC), led by
Marc
Andreessen. Funding for Mosaic came from the High-Performance
Computing and Communications Initiative, a funding program
initiated by the High
Performance Computing and Communication Act of 1991,
one of several
computing developments initiated by Senator Al
Gore. Prior to the release
of Mosaic, graphics were not commonly mixed with text in
Web pages, and its popularity was less than older protocols
in use over the Internet, such as Gopher
and Wide
Area Information Servers (WAIS). Mosaic's graphical
user interface allowed the Web to become, by far, the most
popular Internet protocol.
The World Wide Web Consortium (W3C) was founded by Tim Berners-Lee after he left the European Organization for Nuclear Research (CERN) in October, 1994. It was founded at the Massachusetts Institute of Technology Laboratory for Computer Science (MIT/LCS) with support from the Defense Advanced Research Projects Agency (DARPA) -- which had pioneered the Internet -- and the European Commission.
History in literature
The concept of a home-based global information system goes back at least as far as Isaac Asimov's short story "Anniversary" (Amazing Stories, March 1959), in which the characters look up information on a home computer called a "Multivac outlet" — which was connected by a "planetwide network of circuits" to a mile-long "super-computer" somewhere in the bowels of the Earth. One character is thinking of installing a Multivac, Jr. model for his kids.
The story was set in the far distant future when commercial space travel was commonplace, and yet the machine "prints the answer on a slip of tape" that comes out a slot — there is no video display — and the owner of the home computer says that he doesn't spend the kind of money to get a Multivac outlet that talks.
Standards
-
Many formal standards and other technical specifications define the operation of different aspects of the World Wide Web, the Internet, and computer information exchange. Many of the documents are the work of the World Wide Web Consortium (W3C), headed by Berners-Lee, but some are produced by the Internet Engineering Task Force (IETF) and other organizations.
Usually, when Web standards are discussed, the following publications are seen as foundational:
Additional publications provide definitions of other essential technologies for the World Wide Web, including, but not limited to, the following:
- Uniform Resource Identifier (URI), which is a universal system for referencing resources on the Internet, such as hypertext documents and images. URIs, often called URLs, are defined by the IETF's RFC 3986 / STD 66: Uniform Resource Identifier (URI): Generic Syntax, as well as its predecessors and numerous URI scheme-defining RFCs;
- HyperText Transfer Protocol (HTTP), especially as defined by RFC 2616: HTTP/1.1 and RFC 2617: HTTP Authentication, which specify how the browser and server authenticate each other.
Java
A significant advance in Web technology was Sun Microsystems' Java platform. It enables Web pages to embed small programs (called applets) directly into the view. These applets run on the end-user's computer, providing a richer user interface than simple Web pages. Java client-side applets never gained the popularity that Sun had hoped for a variety of reasons, including lack of integration with other content (applets were confined to small boxes within the rendered page) and the fact that many computers at the time were supplied to end users without a suitably installed Java Virtual Machine, and so required a download by the user before applets would appear. Adobe Flash now performs many of the functions that were originally envisioned for Java applets, including the playing of video content, animation, and some rich GUI features. Java itself has become more widely used as a platform and language for server-side and other programming.
JavaScript
JavaScript,
on the other hand, is a scripting
language that was initially developed for use within
Web pages. The standardized version is ECMAScript.
While its name is similar to Java, JavaScript was developed
by Netscape
and has very little to do with Java, although the syntax
of both languages is derived from the C
programming language. In conjunction with a Web page's Document
Object Model (DOM), JavaScript has become a much more
powerful technology than its creators originally envisioned.
The manipulation of a page's DOM after the page is delivered
to the client has been called Dynamic
HTML (DHTML), to emphasize a shift away from static
HTML displays.
In simple cases, all the optional information and actions available on a JavaScript-enhanced Web page will have been downloaded when the page was first delivered. Ajax ("Asynchronous JavaScript and XML") is a group of interrelated web development techniques used for creating interactive web applications that provide a method whereby parts within a Web page may be updated, using new information obtained over the network at a later time in response to user actions. This allows the page to be more responsive, interactive and interesting, without the user having to wait for whole-page reloads. Ajax is seen as an important aspect of what is being called Web 2.0. Examples of Ajax techniques currently in use can be seen in Gmail, Google Maps, and other dynamic Web applications.
Publishing Web pages
Web page production is available to individuals outside the mass media. In order to publish a Web page, one does not have to go through a publisher or other media institution, and potential readers could be found in all corners of the globe.
Many different kinds of information are available on the Web, and for those who wish to know other societies, cultures, and peoples, it has become easier.
The increased opportunity to publish materials is observable in the countless personal and social networking pages, as well as sites by families, small shops, etc., facilitated by the emergence of free Web hosting services.
Statistics
According to a 2001 study, there were massively more than
550 billion documents on the Web, mostly in the invisible
Web, or deep
Web. A 2002 survey of 2,024
million Web pages determined
that by far the most Web content was in English: 56.4%;
next were pages in German (7.7%), French (5.6%), and Japanese
(4.9%). A more recent study, which used Web searches in
75 different languages to sample the Web, determined that
there were over 11.5 billion Web pages in the publicly
indexable Web as of the end of January 2005.
As of June 2008, the indexable web contains at least 63
billion pages. On July 25,
2008, Google software engineers Jesse Alpert and Nissan
Hajaj announced that Google
Search had discovered one trillion unique URLs.
Over 100.1 million websites operated as of March 2008.
Of these 74% were commercial or other sites operating in
the .com generic
top-level domain.
Among services paid for by advertising, Yahoo!
could collect the most data about commercial Web users,
about 2,500 bits of information per month about each typical
user of its site and its affiliated advertising network
sites. Yahoo! was followed by MySpace
with about half that potential and then by AOL-TimeWarner,
Google,
Facebook,
Microsoft,
and eBay.
About 27% of websites operated outside .com
addresses.
Speed issues
Frustration over congestion
issues in the Internet
infrastructure and the high latency
that results in slow browsing has led to an alternative,
pejorative name for the World Wide Web: the World Wide
Wait. Speeding up the Internet is an ongoing discussion
over the use of peering
and QoS
technologies. Other solutions to reduce the World Wide Wait
can be found on W3C. Standard guidelines
for ideal Web response times are:
- 0.1 second (one tenth of a second). Ideal response time. The user doesn't sense any interruption.
- 1 second. Highest acceptable response time. Download times above 1 second interrupt the user experience.
- 10 seconds. Unacceptable response time. The user experience is interrupted and the user is likely to leave the site or system.
These numbers are useful for planning server capacity.
Caching
If a user revisits a Web page after only a short interval, the page data may not need to be re-obtained from the source Web server. Almost all Web browsers cache recently-obtained data, usually on the local hard drive. HTTP requests sent by a browser will usually only ask for data that has changed since the last download. If the locally-cached data is still current, it will be reused.
Caching helps reduce the amount of Web traffic on the Internet. The decision about expiration is made independently for each downloaded file, whether image, stylesheet, JavaScript, HTML, or whatever other content the site may provide. Thus even on sites with highly dynamic content, many of the basic resources only need to be refreshed occasionally. Web site designers find it worthwhile to collate resources such as CSS data and JavaScript into a few site-wide files so that they can be cached efficiently. This helps reduce page download times and lowers demands on the Web server.
There are other components of the Internet that can cache Web content. Corporate and academic firewalls often cache Web resources requested by one user for the benefit of all. (See also Caching proxy server.) Some search engines, such as Google or Yahoo!, also store cached content from websites.
Apart from the facilities built into Web servers that can determine when files have been updated and so need to be re-sent, designers of dynamically-generated Web pages can control the HTTP headers sent back to requesting users, so that transient or sensitive pages are not cached. Internet banking and news sites frequently use this facility.
Data requested with an HTTP 'GET' is likely to be cached if other conditions are met; data obtained in response to a 'POST' is assumed to depend on the data that was POSTed and so is not cached.
Link rot and Web archival
-
Over time, many Web resources pointed to by hyperlinks disappear, relocate, or are replaced with different content. This phenomenon is referred to in some circles as "link rot" and the hyperlinks affected by it are often called "dead links".
The ephemeral nature of the Web has prompted many efforts to archive Web sites. The Internet Archive is one of the most well-known efforts; it has been active since 1996.
Academic conferences
The major academic event covering the Web is the World Wide Web Conference, promoted by IW3C2.
Security
The Web has become criminals' preferred pathway for spreading
malware.
Cybercrime carried out on the Web can include identity theft,
fraud, espionage and intelligence gathering.
Web-based vulnerabilities now outnumber traditional computer
security concerns, and as measured
by Google,
about one in ten Web pages may contain malicious code.
Most Web-based attacks take place on legitimate websites,
and most, as measured by Sophos,
are hosted in the United States, China and Russia.
The most common of all malware threats is SQL
injection attacks against websites.
Through HTML and URIs the Web was vulnerable to attacks
like cross-site
scripting (XSS) that came with the introduction of JavaScript
and were exacerbated to some degree by Web 2.0 and Ajax
web
design that favors the use of scripts. Today by one
estimate, 70% of all websites are open to XSS attacks on
their users.
Proposed solutions vary to extremes. Large security vendors
like McAfee
already design governance and compliance suites to meet
post-9/11 regulations, and
some, like Finjan
have recommended active real-time inspection of code and
all content regardless of its source.
Some have argued that for enterprise to see security as
a business opportunity rather than a cost center,
"ubiquitous, always-on digital rights management" enforced
in the infrastructure by a handful of organizations must
replace the hundreds of companies that today secure data
and networks. Jonathan
Zittrain has said users sharing responsibility for computing
safety is far preferable to locking down the Internet.
Web Accessibility
-
Many countries regulate web accessibility as a requirement for web sites.
WWW prefix in Web addresses
The letters "www" are commonly found at the beginning of Web addresses because of the long-standing practice of naming Internet hosts (servers) according to the services they provide. So for example, the host name for a Web server is often "www"; for an FTP server, "ftp"; and for a USENET news server, "news" or "nntp" (after the news protocol NNTP). These host names appear as DNS subdomain names, as in "www.example.com".
This use of such prefixes is not required by any technical standard; indeed,
the first Web server was at "nxoc01.cern.ch",
and even today many Web sites exist without a "www" prefix.
The "www" prefix has no meaning in the way the main Web
site is shown. The "www" prefix is simply one choice for
a Web site's host name.
Some Web browsers will automatically try adding "www." to the beginning, and possibly ".com" to the end, of typed URLs if no host is found without them. All major web browser will also prefix "http://www." and append ".com" to the address bar contents if the Control and Enter keys are pressed simultaneously. For example, entering "example" in the address bar and then pressing either just Enter or Control+Enter will usually resolve to "http://www.example.com", depending on the exact browser version and its settings.
Pronunciation of "www"
-
In English, "www" is pronounced "double-you double-you double-you". It is sometimes shortened to "dub, dub, dub".
It is noteworthy, that the English writer Douglas Adams once quipped:
The World Wide Web is the only thing I know of whose shortened form takes three times longer to say than what it's short for.
– Douglas Adams, The Independent on Sunday, 1999
It is also interesting that in Mandarin Chinese,
"World Wide Web" is commonly translated via a phono-semantic
matching to wàn wéi wǎng (万维网),
which satisfies "www" and literally means "myriad dimensional
net", a translation that very
appropriately reflects the design concept and proliferation
of the World Wide Web.
See also
|