HTML Data Types by RFC & IANA Documentation

RFC is a memorandum that describes the methods, behaviours, or research on the working of Internet. IANA is the entity that looks over the global IP address allocation, media types, and other Internet protocol related assignments. According to the RFC and IANA documentation, there are the following four basic data types :

  • Uniform Resource Identifier (URI)
  • Content type
  • Language code
  • Character set

Now let's discuss all the above four basic data types one by one :

HTML Uniform Resource Identifier (URI)

The URI refers to a set of characters used to identify or name a resource on the Internet. URIs can also be defined as a simple and extensible method of identifying a resource on the Internet, as shown in the following example :;type=anime1?name=ferret#nose

In the above example:

http is scheme name
richard:jones is user information (also known as userinfo) is host name
80 is port is authority
over/there/index.dtb;type=anime1 is path
index is file name
dtb is extension
type=anime1? is parameter
?name=ferret# is query
nose is fragment

Let's explain all these terms briefly in the following points :

Term Description
Scheme Refers to the specification for assigning an identifier. The schemes that are used in URI are : HyperText Transfer Protocol (HTTP), File Transfer Protocol (FTP), mailto, Uniform Resource Name (URN), tel, Rapid Spanning Tree Protocol (RTSP), and file.
User Information Refers to the personal information, such as user name and password, which is used to access websites or resources.
Authority Refers to the part that consists of optional user information that is terminated with @, host name, and an optional port number preceded by a color.
Host name Refers to the scheme required to access the given host on the Internet. It is also used for reusing the registration created by Domain Name System (DNS); therefore, saving the cost of deploying another registration.
Port Refers to the optional decimal number that follows the host after a semicolon. Schemes also define their default port number. For example, http has 80 as its default port number.
Path Consists of a sequence of text segments that are separated by a forward slash (/).
File Name Refers to any name that can be given to a targeted file.
Extension Refers to a code of three to four characters, which come after the file name followed by a dot (.). It specifies the information contained in the file. The .html extension signifies that the file contains the html document and the .jpg extension signifies that the file under the consideration is an image file.
Query Starts with a question mark (?), when the URI requests a program to run rather than a file to be accessed. Query represents the parameter to be passed into the server side program.
Fragment Refers to a particular point of the accessed file.

HTML Content Type

The content type (known as Media type or MIME) represents the type of the content used in an embedded or linked resource. For instance, the content type can be plain text or a jpeg image. It is not case sensitive. Its syntax is divided into two parts : top level and bottom level. The top level is separated from the bottom level by a slash (/) symbol. Following are some of the examples of the content type:

  • Text/plain - Represents a plain text
  • Image/jpeg - Represents a compressed image file
  • Audio/basis - Represents an audio file
  • Video/mpeg - Represents a transmitted compressed video file
  • Application/octet-stream - Represents a binary file

HTML Language Code

The language code is used to represent the code of various literal languages, which are used to script the HTML document. It is not case sensitive and is written by using the lang attribute used in the HTML document.

Example of Language Code

The implementation of the language code is shown in the following example:

<html lang="en">

HTML Language Code List

Following table lists some of the language code :

Language Language Code
English En
Hindi Hi
Greek El
German De
Irish Ga

HTML Character Set

The character set is a set of standard characters taken from the several languages and scripts of the world, and are represented with the unique code points. These code points can be defined as the unique names and integers that are assigned to the character sets for their unique identification.

Following are some examples of the character set:

  • dollar symbol
  • yen symbol
  • lower case letters
  • upper case letters
  • delta
  • omega
  • exclamation mark
  • quotation marks

HTML Online Test

« Previous Tutorial Next Tutorial »