What Is a Uniform Resource Locator?

A uniform resource locator (URL) is a representation method used to specify the location of information on a web service program on the Internet. It was originally invented by Tim Berners Lee as an address for the World Wide Web. It has been compiled by the World Wide Web Consortium as an Internet standard RFC1738.

The resources available on the Internet can be represented by simple strings, and this document describes the syntax and semantics of such strings. These strings are called: Uniform Resource Locators (URLs). This note is based on concepts introduced by the World Wide Web global information initiative. RFC1630 "Universal Resource Identifiers" describes some object data and they have been using it since 1990. This URL description meets the requirements described in "Functional Requirements for Internet Resource Locators". This document was written by the Engineering Task Force (IETF) URI Working Group [1]
Just as there are many ways to access resources, there are several options for locating resources. The general syntax of a URL just provides a framework for using protocols to create new schemes, except of course already defined in this document. URLs locate resources by providing an abstract identifier of the location of the resource. After the system locates a resource, it may perform various operations on it. These operations can be abstracted into the following words: access, update, replace, and find attributes. In general, only the access method item needs to be described in any URL scheme.
The outline of the mapping relationship between some existing standard protocols and the protocols under test is described by BNF syntax definition. Some protocols are commented below:
  • ftp File Transfer protocol
  • http Hypertext Transfer Protocol
  • gopher The Gopher protocol
  • mailto Electronic mail address
  • news USENET news
  • nntp USENET news using NNTP access
  • telnet Reference to interactive sessions
  • wais Wide Area Information Servers
  • file Host-specific file names
  • prospero Prospero Directory Service
Some other schemes may be described in later descriptions. The fourth part of this document explains how to register new schemes and lists some scheme names that are under study.
A new scheme can be introduced by defining a mapping to the corresponding URL syntax and using a new prefix. The URL trial scheme can be used through a common agreement between groups. Scheme names beginning with the character "x-" are reserved for the test scheme. The International Assigned Numbers Authority (IANA) will manage the registration of URL schemes. Any new URL scheme submitted must include a definition of the rules for accessing resources in the scheme and must include the syntax describing the scheme. The URL scheme must be practicable and operable. The way to provide such a demonstration is to use a gateway that provides objects in the new scheme for clients using existing protocols. If the new scheme cannot locate a data object resource, then the attributes of the names in this new domain must be clearly defined. The new plan should strive to follow the same grammatical rules as the existing plan where appropriate. The same is true of the protocols that can be accessed with a URL. The client software is specified to be configured to use specific gateway locators for indirect access through the new naming scheme. The following scheme has been proposed several times, but this document does not define its own syntax. It suggested that IANA retain their scheme names for future definition:
afs Andrew File System global file names.
mid Email identifiers for electronic mail.
cid Content identifiers for MIME body
parts).
nfs Network File System file names.
tn3270 Interactive 3270 emulation sessions.
mailserver Access to data available from mail
servers).
z39.50 Access to ANSI Z39.50 services.
This is a BNF-like description of the Uniform Resource Locator syntax, which uses the conventions in RFC822, except for the use of "|"
Select, enclose optional or repeated elements in square brackets []. Simply put, the text is enclosed in quotes ""
Here, optional elements are enclosed in square brackets []. Elements can start with <n> * to indicate that there are n or more such elements; n
The default is 0.
The general form of a URL is as follows:
genericurl = scheme ":" schemepart
; Specific predefined schemes are defined here; new schemes can be registered with IANA
url = httpurl | ftpurl | newsurl |
nntpurl | telneturl | gopherurl |
waisurl | mailtourl | fileurl |
prosperourl | otherurl
The new scheme follows the general syntax
otherurl = genericurl
; Schemes are all lowercase; interpreters should ignore case
scheme = 1 * [lowalpha | digit | "+" | "-" | "."]
schemepart = * xchar | ip-schemepart
; URL scheme part based on protocol ip:
ip-schemepart = "//" login ["/" urlpath]
login = [user [":" password] "@"] hostport
hostport = host [":" port]
host = hostname | hostnumber
hostname = * [domainlabel "."] toplabel
domainlabel = alphadigit | alphadigit * [alphadigit | "-"] alphadigit
toplabel = alpha | alpha * [alphadigit | "-"] alphadigit
alphadigit = alpha | digit
hostnumber = digits "." digits "." digits "." digits
port = digits
user = * [uchar | ";" | "?" | "&" | "="]
password = * [uchar | ";" | "?" | "&" | "="]
urlpath = * xchar; based on the protocol in section 3.1
; Predefined plan:
; FTP (File Transfer Protocol, please refer to RFC959)
ftpurl = "ftp: //" login ["/" fpath ["; type =" ftptype]]
fpath = fsegment * ["/" fsegment]
fsegment = * [uchar | "?" | ":" | "@" | "&" | "="
ftptype = "A" | "I" | "D" | "a" | "i" | "d"
; FILE (file)
fileurl = "file: //" [host | "localhost"] "/" fpath
; HTTP (Hypertext Transfer Protocol)
httpurl = "http: //" hostport ["/" hpath ["?" search]]
hpath = hsegment * ["/" hsegment]
hsegment = * [uchar | ";" | ":" | "@" | "&" | "="]
search = * [uchar | ";" | ":" | "@" | "&" | "="
GOPHER (see RFC1436)
gopherurl = "gopher: //" hostport [/ [gtype [selector
["% 09" search ["% 09" gopher + _string]]]]]
gtype = xchar
selector = * xchar
gopher + _string = * xchar
; MAILTO (please refer to RFC822)
mailtourl = "mailto:" encoded822addr
encoded822addr = 1 * xchar; further defined in RFC822
; NEWS (news, please refer to RFC1036)
newsurl = "news:" grouppart
grouppart = "*" | group | article
group = alpha * [alpha | digit | "-" | "." | "+" | "_"]
article = 1 * [uchar | ";" | "/" | "?" | ":" | "&" | "="] "@" host
; NNTP (Network News Transmission Protocol, please refer to RFC977)
nntpurl = "nntp: //" hostport "/" group ["/" digits]
TELNET (Remote Login Protocol)
telneturl = "telnet: //" login ["/"]
; WAIS (Wide Area Information Service System, please refer to RFC1625)
waisurl = waisdatabase | waisindex | waisdoc
waisdatabase = "wais: //" hostport "/" database
waisindex = "wais: //" hostport "/" database "?" search
waisdoc = "wais: //" hostport "/" database "/" wtype "/" wpath
database = * uchar
wtype = * uchar
wpath = * uchar
; PROSPERO
prosperourl = "prospero: //" hostport "/" ppath * [fieldspec]
ppath = psegment * ["/" psegment]
psegment = * [uchar | "?" | ":" | "@" | "&" | "="]
fieldspec = ";" fieldname "=" fieldvalue
fieldname = * [uchar | "?" | ":" | "@" | "&"]
fieldvalue = * [uchar | "?" | ":" | "@" | "&"]
Other definitions
lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
"i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
"q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
"y" | "z"
hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
"J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
"S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
alpha = lowalpha | hialpha
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
"8" | "9"
safe = "$" | "-" | "_" | "." | "+"
extra = "!" | "*" | "'" | "(" | ")" | ","
national = "{" | "}" | "|" | "\" | "^" | "~" | "[" | "]" | "` "
punctuation = "<" | ">" | "#" | "%" | <">
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "="
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" |
"a" | "b" | "c" | "d" | "e" | "f"
escape = "%" hex hex
unreserved = alpha | digit | safe | extra
uchar = unreserved | escape
xchar = unreserved | reserved | escape
digits = 1 * digit
The URL scheme itself does not pose a security threat. Users need to be careful: URLs pointing to a given object at one time are not guaranteed to always point to this object. There is not even a guarantee that moving to an object on the server will later point to a different object. A URL-related security threat is that constructing a URL that attempts to perform a harmless idempotent operation such as retrieving an object can sometimes lead to destructive remote operations. This insecure URL is usually generated by specifying a number of ports other than those reserved for the network protocol in question. The client inadvertently dealt with the same server, and this server was actually running a different protocol, which caused the instructions contained in the URL content to be interpreted by other protocols, resulting in unexpected operations. One example is to use a gopher URL to generate a raw message and send it through an SMTP server. Warnings should be used when using URLs whose specified port is not the default port, especially if this port number appears in the reserved space. Attention should be paid when the URL contains embedded encoding-specific delimiters (eg, the CR and LF characters of the telnet protocol) and is not decoded before transmission. In addition to this, it may be used to simulate an operation or parameter beyond its scope, which will interfere with this protocol, and once again cause the execution of unexpected and potentially harmful remote operations. Using a URL that contains a password that should be secret is very rash.

IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?