A3 H7 The Application Layer

7.1 DNS - Domain Name System

Program rarely refer to hosts, mailboxes and other resources by their binary network address. Instead they use ASCII strings like hera.cs.kun.nl. Thus some mechanism is required to map the name to binary numbers. In the beginning of ARPANET, there was simply a file, host.txt, that listed all the hosts and their IP addresses. Every night, all the hosts would fetch it from the site where it was maintained. This works only well up to a few hundred hosts.

DNS is a hierarchical, domain-based naming scheme and a distributed database system for implementing this naming scheme. To map a name to an IP address, an application program calls a library procedure called the resolver passing it the name as parameter. This sends a UDP packet to a local DNS server, which looks up the name and returns the IP address.

7.1.1 The DNS Name Space

The Internet is divided into several top-level domains, generic (com, edu, gov, int, mil, net) and countries (us, nl, etc. according ISO 3166). Each domain is named by the path upward from it to the unnamed root: cs.kun.nl. Domain names are case insensitive, each component can be up to 63 characters and the total length may not exceed 255 characters.

Naming follows organizational boundaries, not physical. If two department are in the same building and share the same LAN, they can have distinct domains. If a department is split over two buildings or has several LANs, all its hosts can be in the same domain.

7.1.2 Resource Records

Every domain, whether it is a single host or a top-level domain, can have a set of resource records associated with it.

ComponentMeaning
Domain-namedomain to which this record applies
Time-to-live indication of how stable the record is, indicates how long it can be cached by others
Typetells what kind of record is it
Classfor Internet always IN
Value number, domain name or an ASCII string, semantics depends on type
Type Meaning Value
SOAStart of authorityparameters for this zone
AIP address of a host32 bit integer
MXMail exchangename of the domain prepared to accept email, priority
NSName servername of a name server
CNAMEcanonical nameDomain name (works as a macro)
PTRpointeralias for an IP address
HINFOHost descriptionCPU, OS in ASCII
TXTtextuninterpreted ASCII text

7.1.3 Name servers

The DNS name space is divided up into nonoverlapping zones. Each contains part of the tree and name servers holding the authoritative information about that zone. Normally there is one primary name servers which gets its information from a file on a disk, and one or more secondary name servers getting their information from the primary one.

Suppose the resolver on flits.cs.vu.nl wants to know the IP address of the host linda.cs.yale.edu. It sends an UDP query to the local name server cs.vu.nl. If that has no (cached) information about it, it sends a packet to the top-level name server for edu, edu-server.net, which it has in its database. This forwards the request to yale.edu as it knows the address of all its children. There the request is forwarded to cs.yale.edu, which knows the address of linda.cs.yale.edu. The resulting resource record then goes back the chain. The local name server cs.vu.nl caches the result for a certain amount of time. However this information is not authoritative, since changes made at cs.yale.edu will not be propagated to all the caches in the world.


7.2 Electronic mail

Email systems consists of 2 subsystems: the user agents, which allow people to send and read email, and the message transfer agents, which move the messages from the source to the destination.

Typically 5 basic functions are supported. Composition refers to the process of creating messages and answers. Transfer refers to moving messages from the originator to the recipient. Reporting has to do with telling the originator what has happened to the message. Displaying incoming messages is needed so people can read their email, sometimes conversion is required or a special viewer must be invoked, if the message is a Postscript file or digitized voice. Disposition concerns what the recipient does with the messages after receiving it.

Most systems allow users to create mailboxes to store incoming mail. Commands are needed to create and destroy mailboxes, inspect their contents, insert and delete messages, and so on.

A key in all modern email systems is the distinction between the envelope and its contents. The envelope encapsulates the message and it contains all the information for transporting the message. The message consists of two parts: the header, containing control information for the user agents, and the body which is entirely for the human recipient.

The original email message format was defined in RFC 822. Messages consist of a primitive envelope (defined in RFC 821), some number of header fields, a blank line, and then the message body. Each header field logically consists of a single line of ASCII text containing the field name, a colon and, for most fields, a value (e.g. To: ths@cs.kun.nl). The fields are related to message transport (e.g. Cc: ) or are for use by the user agents (e.g. Reply-To:) or human recipients (e.g. Subject:). The body contained 7-bit ASCII characters only, suitable for English plain text.

Problems occurred when people liked to send and receive messages in languages with accents (e.g. French), in non-latin alphabets (e.g. Russian), in languages without alphabets (e.g. Japanese) or messages containing no text at all, but audio or video. The solution MIME (Multipurpose Internet Mail Extensions) is now widely used. Its basic idea is to continue the RFC 822 format, but to add structure to the message body and define encoding rules for non-ASCII messages.

HeaderMeaning
MIME-Version:if absent, the old format (English plaintext) is assumed
Content-Description: an ASCII string telling what is in the message (e.g. Photo of Theo)
allowing the user to determine whether decoding and reading is worth the effort
Content-Id:unique number for referencing the message later
Content-Transfer-Encoding:how to wrap the message into 7 or 8 bit ASCII
Content-Type:nature of the message following, e.g. video/mpeg

Messages within Internet are delivered by establishing a TCP connection to port 25 of the destination machine. Listening there is the email daemon speaking the SMTP (Simple Mail Transfer Protocol). It accepts incoming connections and copies them into the appropriate mailboxes. SMTP is a simple ASCII protocol, the initial message sent by the client is HELO (all commands are 4 characters). To get around some problems with SMTP (including infinite mailstorms in rare occasions) an extended version (ESMTP) has been defined. It starts with an initial EHLO message, returning to SMTP if the server rejects it.


Email using SMTP works best when both the sender and receiver are on Internet and can support TCP. A solution for machines that are not on the Internet (all the time) or not capable to send and receive email, is to use a remote mailbox on a email server. POP3 (Post Office Protocol) is a simple protocol used for fetching email from a remote mailbox to store it on the user's local machine to be read and handled later.

Users who use multiple machines (a workstation at work, a PC at home and a laptop on the road) can use the IMAP (interactive Mail Access Protocol). Email is there left on the email server in a central repository that can be accessed from any machine.

Another possibility is Webmail. Some ISPs provide users access to their mailboxes on the ISP servers using Web forms.


7.3 The World Wide Web

The WWW (World Wide Web) is an architectural framework for accessing linked documents spread out over thousands of machines all over the Internet.

It began in 1989 at CERN, the European center for nuclear research, in Geneva. CERN has several accelerators at which large teams of scientists, often hundreds from dozen of research institutes or universities spread over the world, carry out research in particle physics. Most experiments are highly complex, requiring years of advanced planning and equipment construction. The Web grew out of the need to have these dispersed researchers collaborate using a constantly changing collection of reports, blueprints, drawings, photos and other documents.

The initial proposal for a web of linked documents came from Tim Berners-Lee in March 1989. The first, text-based prototype was operational 18 month later. NCSA (National Center for Supercomputer Applications) developed Mosaic, the first graphical interface, released in February 1993. A year later, its developer Marc Andreesen left NCSA to form Netscape Communication Corp., whose goal was to develop clients, servers and other Web software. When Netscape went public in 1995, investors paid 1.5 billion dollars for the stock, even though the company had only one product, was operating deeply in the red and had announced that it did not expect to make a profit for the foreseeable future. For the next 3 years, Netscape Navigator and Microsoft's Internet Explorer engaged in a "browser war". In 1998 America Online bought Netscape for 4.3 billion dollars.

In 1994 the WWW Consortium was founded devoted to further developing the Web, standardizing protocols and encouraging interoperability between sites.

7.3.1 Architectural Overview

Every Web site has a server process listening to TCP port 80 for incoming connections from clients (normally browsers). After a connection has been established by the client, it sends 1 request and the server sends 1 reply, then the connection is released. The protocol used for this is called HTTP, which is a simple ASCII based protocol. The client is usually a browser, which displays the received page.

The Client Side

Suppose a user clicks on some text that points to the page whose name (in URL format) is http://www.cs.kun.nl/~ths/index.html. The steps that occur then are:

  1. The browser determines the URL (by seeing what was selected)
  2. The browser asks DNS for the IP address of www.cs.kun.nl
  3. DNS answers with the IP number
  4. The browser makes a TCP connection to that number on port 80
  5. It then sends a GET /~ths/index.html command
  6. The www.cs.kun.nl server sends the file index.html
  7. The TCP connection is released
  8. The browser displays all the text in index.html
  9. The browser fetches all images indicated in index.html, by establishing a TCP connection for each of them, and displays them.

A web page may contain HTML code, images in GIF or JPEG format, sound in MP3 format, video in MPEG format, documents in PDF, MSWord or other formats, or information in many other formats. Some are handled directly by a browser. Some by a plug-in, a code module that the browser fetches from disk and installs as an extension to itself. For others the browser starts up another program, a helper application as a separate process.


The Server Side

This performs the following steps in its main loop:

  1. Accept a TCP connection from a client.
  2. Resolve the name of the page requested. Sometimes a default name has to be taken, like index.html. Modern browsers specify the user's default language, so the name is changed accordingly.
  3. Authenticate the client, needed for pages that are not available to the general public like your bank account.
  4. Perform access control on the client, can the requested page been sent given the client's identity and location.
  5. Perform access control on the web page, some pages may only been sent to clients on particular domains, e.g. inside the company.
  6. Check the cache if the page is there, otherwise get it from disk.
  7. Determine the MIME type and include it in the header of the reply.
  8. Other possible tasks, like building a user profile, gathering statistics or making an entry in a logfile.
  9. Return a reply, either the requested file or error information
  10. Release the TCP connection

To increase the number of request per second that can be handled methods like efficient caching, multitreading, parallel systems, server farms, efficient disk systems and TCP handoff are used.


URLs - Uniform Resource Locators

Name Used for Example
httpHypertexthttp://www.cs.kun.nl/~ths
ftpFTPftp://ftp.cs.vu.nl/pub
fileLocal filefile:///usr/theo/prog.c
newsNews groupnews:comp.os.minix
newsNews articlenews:AA0134223112@cs.utah.edu
gopherGophergopher://gopher.tc.umn.edu/11/Libraries
mailtoSending emailmailto:ths@cs.kun.nl
telnetRemote logintelnet://hera.cs.kun.nl

A URL consists of 3 parts: a protocol, the DNS name of the host, and the file name, with certain punctuations separating the pieces. For file names, certain shortcuts can be build in, e.g. ~ths is expanded to eita/staff/ths/index.html. Besides the absolute URLs shown in the table, there are also relative URLs. The difference is analogous to the absolute file name /usr/ast/foobar and just foobar in Unix when the context is unambiguously defined.

URL has an inherent weakness: it points to one specific host. It does not provide any way to reference a page without telling where it is. The user is often not interested in the latter. For pages that are heavily referenced, it is desirable to have multiple copies far apart (e.g. in Europe and USA) to reduce the network traffic. A system of URN (Uniform Resource Names) is being worked on.


Statelessness and Cookies

The web is basically stateless, there is no concept of a login session.The browser sends a request to a server and gets an answer back. Then the server forgets that it has ever seen that particular client. This is fine for retrieving publicly available documents, were the web was designed for. But not for other kinds of use like e-banking, e-commerce, customized web portals, etc. For that use, the server has to know more about the user requesting a page. IP numbers are not suitable for that, because of the use of dynamic IP addresses and NAT and the fact that there may be more than one user on a computer.

To provide the information cookies are used. The name derives from ancient programmer slang in which a program calls a procedure and get something back that it may need to present later to get some work done. UNIX file descriptors or Windows object handles are examples of this. When a client requests a page, the server may send in the reply header a cookie, a small, at most 4 KB, text string. Browers may accept it and store it on disk. When the browser later sends a request it checks whether it has cookies for the domain the request is for. It includes them in the request so the server can use them.

A cookie contains up to five fields. The domain where the cookie came from and the path in the server's directory structure, which may use the cookie. The content is of the form name=value. Further there is an expiration date and the secure field which indicates that the cookie may only be returned to a secure server.

Cookies have a bad name. Hackers have exploited browser bugs to capture cookies not intended from them and containing for instance credit card numbers. They have also been used to secretly collect information about user's Web browsing habits.


7.3.2 Static Web documents

HTML - HyperText Markup Language

Web pages are written in a language called HTML (HyperText Markup Language). It allows to produce web pages that include text, graphics and pointers to other web pages. HTML is an application of the ISO standard SGML (Standard Generalized Markup Language), but specialized to hypertext and adapted to the web. HTML is a markup language, containing explicit commands (called tags) for formatting the text in the text itself (like TeX and troff). For example, in HTML, <B> means start boldface mode, and </B> means leave boldface mode. Usually tags comes in pairs, <something> to denote the beginning of something, </SOMETHING> to mark its end, note that the tags are not sensitive to case.

Documents written in a markup language can be contrasted to documents produced with a WYSIWYG word processor like MS-Word or WordPerfect. These may store their files with hidden embedded markup or keep the markup in separate data structures as happen on the Macintosh. Nowadays these word processors often offer the option of saving documents in HTML, with loss of certain proprietary markup.

By embedding the markup commands within each HTML file and standardizing them, it becomes possible for any web browser to read and reformat any web page. This is crucial because a web page may have been produced full screen on a 1024 x 768 display with 24-bit color but may have to be displayed in a small window on a 640 x 480 screen with 8-bit color. How things are displayed is up to the browser. For example, headings in text are indicated with <Hn> with n a digit in the range 1 to 6. Typically <H1> headings are displayed in a large boldface with at least a blank line before and after, but the browser may also choose to use color. The designer of a web pages has thus little control over how the page is displayed, in contrast to Postscript and Adobe Acrobat.

Like HTTP, HTML is constantly changing. Version 1.0 was the de- facto standard used in the Mosaic browser. When new browsers came along there was the need of a formal Internet standard, version 2.0. Version 3.0 was initially created as a research effort to add many new features, including tables, toolbars and cascaded style sheets. The latter gives page designers more control over the appearance of pages on browsers. They can also be included in pages (like the C #define) to give all pages the same appearance.

There is a provision for indicating the font of pieces of text, e.g. to use greek and mathematical symbols, but no way to include font definitions in a page file. If the browser can not read the indicated font definition, it has to choose another font. Thus indicating a font works only well when the system on which the page was developed is of the same type as where it is displayed on, and if both systems have the indicated font available. Thus for this course I had to include mathematical symbols as images.

Forms

Forms allow the user to fill in information in boxes or make choices using buttons, and send that information back to the page's owner. The information is packed in a long string, for example:
http://www.altavista.com/cgi-bin/query?pg=q&kl=nl&q=%2Bonzin+niet&search=Search
Note that a ? is used to separate the real URL from the filled in information, the + indicates a space, the %2B indicates a typed in +, and & is used to separate fields. Each field is in the format: name=value.

On the server side the CGI (Common Gateway Interface) is used to process the information. The server knows that the files in the cgi-bin directory are scripts or programs (e.g. in Perl), thus in the above case the script 'query' is started with the string after the ? as its parameter. The script does its work, e.g. search a database, and returns its result as a HTML page.

XML and XSL

There is a increasing need for structuring Web pages and separating the content from formatting. For example, a program that searches the Web for the best price for some CD needs to analyze many Web pages looking for the item's title and price. With Web pages in HTML it is very difficult for a program to find out where the title is and where the price is.

XML (eXtensible Markup Language) describes Web content (or any other content) in a structured way. The XML file on the left defines a structure called book_list, a list of books, each having 3 fields. The structure could be more complicated have repeated fields (e.g. multiple authors), optional fields (e.g. title of included CD-rom) and alternative fields ( URL of a bookstore if it is in print or URL of an auction side if it is out of print). Fields can also be subdivided, e.g. first-name and last-name of the authors.


How the XML page is to be formatted and displayed on a screen is determined by a XSL) (eXtensible Style Language) file, an example is shown on the left. It looks like HTML but has stricter syntax requirements, a browser should reject it if for instance a closing tag like </th> is missing.

XSL commands are given with a xsl tag, like <xsl:xxxx>. The for-each command iterates over the given structure, the list of books.

The next step after HTML 4 will be called XHTML (X from eXtended) and is essentially HTML 4 reformulated in XML. It needs a XSL file to provide display meaning to its tags. Strict performance to the syntax is required, like closing tags, tags and attributes in lower case, attributes in quotation marks and proper nesting of tags.



7.3.3 Dynamic Web Documents

In recent years more and more content has become dynamic, that is, generated on demand rather than stored on a disk. Content generation can take place either on the server side or on the client side.

Server-Side Dynamic Web Page Generation


We have already seen how a form is processed. The client sends a page request with the information typed in the form and the program to process that, encoded in the file name part of the URL. It uses CGI, a standardized interface to call programs and scripts with parameters. They generate a web page in HTML which is then returned to the client. Usually scripts in Perl or Python are used, because they can easily handle text strings. But also a program written in C could be used.


Another way to generate dynamic content is to embed little scripts inside HTML pages and have them be executed by the server itself to generate the page. A popular language for this is PHP (PHP: Hypertext Preprocessor). To use it the server has to understand PHP, usually page containing PHP have file extension 'php' rather than 'html' or 'htm'.

The PHP commands are included in the HTML tag <?php ... ?>. The image shows on the top a form with 2 entry fields. Below is the 'action.php' file with the PHP commands. They have access to the information filled in the form using the name of the fields, e.g. $age. They produce a text string which is included in the output send to the client.

PHP is actually a powerful programming language oriented towards interfacing between the WEB and a server database. It is open source and freely available, and specially designed to work well with Apache, which is also open source and is the world's most widely used Web server.

JSP (Java Server Pages) is similar to PHP, except that the dynamic part is written in the JAVA programming language instead of in PHP. ASP (Active Server Pages) is Microsoft's version, using Visual Basic Script for generating the dynamic content.



Client-Side Dynamic Web Page Generation


Here a program contained in a web page is executed by the browser and the result is displayed. No information is send to the server. JavaScript can be used for this, a scripting language very loosely inspired by some ideas from JAVA. It is a full-blown programming language, with variables, strings, arrays, objects, functions, and all the usual control structures.


It has the ability to manage windows and frames, set and get cookies, deal with forms and handle hyperlinks. As these things are rather internal to browsers, and often different for different browsers and versions, it is difficult to write JavaScript programs which work correctly for all browsers, versions and platforms.

It is embedded in a HTML page using the 'script' tag or inline at certain locations. It can also track mouse movements and actions, like in the example to the left. When the mouse is over a link, a window with a certain image is displayed.

Another popular to make web pages highly interactive is through the use of applets. These are small JAVA programs embedded with the 'applet' tag and executed by a Java Virtual Machine. As they are interpreted, the interpreter can prevent them from doing Bad Things. In theory at least, in practice applet writers have found a nearly endless stream of bugs in the Java I/O libraries to exploit.

Microsoft's answer to SUN's applet was allowing web pages to hold ActiveX controls. They are faster than applets, but only run on Window machines.


7.3.4 HTTP HyperText Transfer Protocol

MethodDescription
GET Read a web page. Can be followed by an
If-Modified-Since header, for caching purposes
HEAD Read the header of a web page. Can be used to get
the date of the last modification, to collect information
for indexing purposes or just to check a URL for validity.
PUT Store a web page. Makes it possible to build a
collection of web pages on a remote server.
POST Append to a named resource. Can be used to post a
message to a news group or adding a file to a bulletin board.
DELETERemoves a web page
ECHOEcho the incoming request.
OPTIONSQuery certain options.

HTTP specifies what messages clients may send and what responses they get back from the servers. Each interaction consists of one ASCII request, followed by one MIME-like response. It has been intentionally made more general than necessary with an eye to future object-oriented applications. For this reason, operations, called methods other than just requesting a Web page are supported

A method is followed by a resource name, e.g. a web page, and for newer versions of HTTP by the protocol version. In that case MIME like headers can be send and received. Every request gets a response consisting of a status line and possible other information.


The status code response groups and some examples.



The request line may be followed by request headers and the responses may contain response headers. The Host header names the server and is taken from the URL. It is mandatory because some IP addresses may now serve multiple DNS names and the server needs some way to tell which host to hand the request to.

The accept headers tell the server what the client is willing to accept in case it has a limited repertoire of what it can handle. It also allows the server to send back a page in a certain language, if it has a choice.


7.3.5 Performance Enhancements

Besides the architecture of the servers, also caching at various places plays an important role in making WWW not stand for World Wide Wait. Its goal is to place Web pages "closer" to the place where they are used, closer in the sense of requiring less time and other resources such as bandwidth. There are two questions related to caching: who should do it and how long should pages be cached.

The usual procedure is for some process, called a proxy, to maintain a cache. The browser requests a page to the proxy. If it is not there the proxy passes the request to another proxy or to the server and stores the returned page in the cache before given it to the browser. Individual PCs often run proxies (usually embedded in the browser), so they can quickly look up pages previously visited. Proxies can also be installed on the company LAN and also many ISPs run proxies. Both to serve their users more quickly and to save on communication costs.

Some pages should not be cached at all. Like a page with stock values which change every second. Unless of coarse the stock exchange has closed for the day. Thus the cacheability of a page may very wildly over time.

The key issue is how much staleness users are willing to accept. The less stale pages a user wants, the longer he has to wait. A common heuristic is to base the holding time on the Last-Modified header. The longer that was ago, the longer the page is hold in cache, before checking for an updated version. That checking is done using a If-Modified-Since request header which returns the page if it was changed, and otherwise a short Not Modified message is returned.

A browser usually has settings to let the user have control over the caching strategy. For example, in Explorer the user can choose the automatic mode, described above, always or never check for updates, or check each page one after starting up Explorer. The user can also clear the cache and force an update check for the page currently displayed.

The server can also instruct all proxies to not cache the current page or to not use it again without checking for freshness. This is used for dynamically generated pages (e.g. by a PHP script) or for any page expected to change quickly.


7.3.6 The Wireless Web

WAP- The Wireless Application Protocol

WAP provides mobile phones or PDAs with a built-in screen for wireless access to email and Web pages. It is essentially a protocol stack for accessing the Web, optimized for low-bandwidth connections using wireless devices having a slow CPU, little memory and a small screen. The lowest layer provides a data rate of 9600 bps. WDP is in essence UDP. WTLS is a subset of Netscape's SSL (Secure Socket Layer), discussed in chapter 8. WTP replaces TCP, which is not used over the air link for efficiency reasons.


WSP is similar to HTTP/1.1 but with some restrictions and extensions for optimization purposes. WAE is a microbrowser. Its does not use HTML but WML (Wireless Markup Language), which is an application of XML. In principle, a WAP device can only access pages that have been converted to XML. An on-the-fly filter from HTML to WML is used to increase the set of pages available to the user.

WAP-1.0 was probably a little ahead of its time. It was not a success, also due to its high costs.


I-mode

I-Mode is a success in Japan, read Tanenbaum why and why it will not be easily transportable to Europe or the US. It is based on a new transmission network, a new handset and a new language for Web page design. The handset looks like a mobile phone with a small screen added, from 72x94 up to 120x160 pixels with 8 bit colors. This is not enough for photographs but is adequate for drawings and simple cartoons. There is no mouse, navigation is done with arrow keys.

For voice the existing circuit switched network is used and billing is per minute of connect time. For data a new packet switched network is used specially constructed for i-mode, it is always on and billing is based on the number of packets. It is based on CDMA and transmits 128-byte packets at 9600 bps.

When i-mode is switched on, the user is presented with a list of categories approved by NTT DoCoMo, the owner of i-mode. There are about 20 categories and over 1000 services, each one run by an independent company. The most popular service is email, which allows 500-byte messages, seen as a big improvement over SMS with its 160 byte messages. Games are also popular services. There are also over 40,000 i-mode Web sites, but they have to be accessed by typing in their URL.

I-mode uses cHTML (c for compact), approximately HTML-1.0 with a few omissions and some extensions like for dialing a telephone number or selecting hyperlinks using the keyboard. JavaScript, frames, style sheets, background color or images are not supported. Also JPEG images are not supported, decompressing would take too much time. Although the Japanese language has tens of thousands of kanji, 166 new ones, called emoji were added.


Second-generation Wireless Web

Read it to get a glimpse of future developments

7.4 Multimedia

Literally, multimedia is just two or more media. In principle, books combining text and graphics could count. Generally it is meant the combination of two or more continuous media, that is, media that have to be played during some well-defined time interval, usually with some user interaction. In practice, the two media are normally audio and video.

7.4.1 Introduction to digital Audio

An audio wave is a one-dimensional acoustic (pressure)wave. The frequency range of the human ear runs from 20 Hz to 20 kHz. The ear hears logarithmically, so the ratio of two sounds with power A and B is expressed in dB (decibels):
dB = 10 log 10(A/B)
The lower limit of audibility of a 1-kHz wave is about a pressure of 3 10-5 Pa (=N/m2). Define that as 0 dB, than an ordinary conversation is about 50 dB and the pain threshold is about 120 dB, a dynamic range of a factor 1 million. There are many dB scales depending on the (filtered) frequency range measured and the pressure which is defined as 0 dB.

In a microphone pressure changes are converted into an electrical signal. To make it digital the signal is sampled at regular time intervals by an ADC Analog Digital Converter). The Nyquist theorem states that it is sufficient to make samples at a frequency of 2f if the highest frequency in the signal is f. Each sample is quantized into a number of levels (9 in the figure), usually expressed in a number of bits. Telephone uses 8000 samples per second (thus maximal 4 kHz) with 8 bits, thus 256 levels (in North America and Japan this is 7 bits). Audio CDs use 44,100 samples/sec (up to 22,050 Hz) with 16 bits, thus 65,536 levels. This gives a rate of 1.411 Mbps for stereo sound.


7.4.2 Audio Compression

Compression methods often use perceptual coding to achieve a high compression. It exploits certain characteristics of the human ear: frequency masking meaning that a load sound at a certain frequency makes softer sounds in other frequencies unhearable (shown in the figure) and temporal masking meaning that this effect continues a while after the load sound is ended.


MP3 (MPEG audio layer 3) samples the waveform at 32, 44.1 or 48 kHz and the output rate can be chosen, e.g. 96 kbps for rock 'n roll or 128 kbps for a piano concert. Frequency bands with the most unmasked (hearable) spectral power are encoded in more bits than bands with less power. Various other techniques are used for noise reduction, antialiasing and exploiting the interchannel redundancy for stereo.

7.4.3 Streaming Audio

A browsers can get an audio file from a HTTP server, store it on disk and start a media player (e.g. RealOne Player, Windows Media Player, Winamp, etc.). A song is fully downloaded before it starts to play. Starting to play during the download is often more desirable.


The server is often a specialized media server like RTSP (Real time Streaming Protocol) with suitable commands, like PLAY or PAUSE (see figure 7-62). This is used to keep sufficient data in the buffer, used to eliminate jitter and to compensate for small interruptions in the transfer. The actual data transfer is done with a protocol like RTP, a real time protocol on top of UDP. Lost packets are compensated for as much as possible by interpolating from neighboring data.


Sending alternating packets with even and odd time samples the effect of a lost packet can be reduced. It reduces the temporal resolution rather than creating a gap in time. As described here it only works with uncompressed samples, but there is also a scheme that works with compressed audio.


7.4.4 Internet Radio

Internet Radio could use RTP/RTSP with multicasting. But few ISPs support multicasting thus that is not often used. Another problem is that the UDP packets can be filtered out by firewalls in companies, so people at work cannot use them. Thus often individual TCP connections are used, creating more timing problems than with UDP.


7.4.4 Video Compression

JPEG

JPEG is a standard for compressing single images, we concentrate here on the compression of 24-bit RGB images. It is a lossy compression meaning that the original image can not be fully reconstructed from the compressed data.


In block preparation each RGB value is transformed into another triple of values, one representing the luminance and two the chrominance, the color components. As the human eye has a greater resolution for intensity than for color, the color parts are reduced in resolution by taking the average of each block of 4 pixels. This gives a data reduction by a factor 2. Further the values are centered around 0 by subtracting 128 and each matrix is divided up into blocks of 8 by 8 pixels.


A discrete cosine transformation is applied to each block, describing each block as a summation of cosine wave of different frequencies. The lowest frequency (0,0) is the average value of the block, it is called the DC (direct current) component. The eye is less sensitive to higher frequencies, so they can be represented in fewer bits.


This is done by dividing the 64 DCT coefficients by a quantization table, where higher frequencies are reduced the most. The amount of compression can be adjusted per image by changing the quantization table, which must thus be stored with each compressed image. The DC component is further compressed by replacing it with the difference to the previous block.


The 64 quantized coefficient are then linearized using a zig-zag pattern. This produces in general a lot of consecutive 0s which can be reduced to a single count saying how many 0s are present. This technique is known as run-length encoding. Finally a Huffman code is applied which assigns shorter output codes to common input numbers than uncommon ones.

Decompressing a JPEG images requires running the algorithm backwards. The time to compress an image is roughly equal to time for decompression.


MPEG

MPEG is a group of standards for compressing video images with sound, each with different applications in mind. It provides a clock running at 90 kHz for synchronization of the sound and images streams. One of the possible audio compression methods is already discussed.

A first compression is achieved by compressing each frame using JPEG. Further compression is achieved by exploiting temporal redundancy, often a frame resembles the previous one. A new frame is described in macroblocks of 16 by 16 pixels in luminance space and 8 by 8 blocks in chrominance space. The previous frame is searched for a macroblock, shifted in x and y, with a good match. The difference between the two is then compressed using the JPEG method. For the shift in position, fractional pixels can be used, which requires interpolation pixel values. The amount of compression achieved depends on how long and good one searches for matching macroblocks. This can take a long time, which is acceptable for a one-time encoding of a film library but not for real-time videoconferencing.

Further compression can be achieved by allowing the reference macroblock to be in either the previous frame or in a succeeding frame (usually compressed with JPEG directly). Not all MPEG decoders support this mode, but this is changing.


Gewijzigd op 26 februari 2003 door Theo Schouten.