URL Encoding in HTML – HTML URL Encode Characters

Free Web development courses with real-time projects Start Now!!

Welcome to DataFlair HTML Tutorial. We will learn about HTML URL Encode and encoding process along with encode characters.

html url encode

HTML URL Encoding

Uniform Resource Locator or URL is used as the address of a document on the web. It can be composed of words, typically the Domain Name Server(DNS), or IP address. For example, https://data-flair.training/blogs/ is a URL.

The structure of this URL is as follows-
scheme://prefix.domain:port/path/filename

Here:

Scheme – Defines the internet service type, commonly http or https.
Prefix – Defines the domain prefix, www.
Domain – It defines the domain name of the internet, data-flair.training
Port – Defines the host’s port number, 80 is the default port number for http.
Path – Defines the path at the server.
Filename – It defines the name of the file or the document that is being displayed.

HTML URL Schemes

Some common URL schemes are-

  • Http(HyperText Transfer Protocol) – Used for common web-pages. It is not encrypted.
  • Https(Secure HyperText Transfer Protocol) – Used for secure web-pages. It is encrypted.
  • ftp(File Transfer Protocol) – Used for downloading and uploading files.

HTML URL Encode Characters

URL encoding is the practice of translating characters within URL to ASCII so that they can be easily transmitted and get accepted by all the browsers present globally on the internet. The non-ASCII characters are shown with a percentage sign (%) followed by hexadecimal digits.

Hence, URL encoding basically involves replacing a character that does not start with ‘%’ followed by hexadecimal digits to the ASCII character set. For example, if you want to type a space in the URL, you write it as %20. $ is replaced by %24.

http://www.example.com/new%20article.htm

We would interpret this as ‘new article.htm’

The browser encodes the input as per the character set used in the document. The character set used in HTML5 is UTF-8.

The character that is encoded are:

a. HTML ASCII Control Characters

The characters used for output control. Typically from 00-1F(0-31 in decimal) and 7F(127 in decimal).
ASCII Encoding Example:

CharacterFrom Windows-1252From UTF-8
%80%E2%82%AC
£%A3%C2%A3
©%A9%C2%A9
®%AE%C2%AE
À%C0%C3%80
Á%C1%C3%81
Â%C2%C3%82
Ã%C3%C3%83
Ä%C4%C3%84
Å%C5%C3%85

For entire URL encoding, please visit https://www.eso.org/~ndelmott/url_encode.html

b. HTML Non-ASCII Characters

The characters beyond the ASCII characters i.e., beyond 128 characters. Following is the list of Non-ASCII url encoding-

CHARACTERENCODED FORM
128%80
129%81
130%82
131%83
132%84
133%85
134%86
135%87
136%88
137%89
138%8a
139%8b
140%8c
141%8d
142%8e
143%8f
144%90
145%91
146%92
147%93
148%94
149%95
150%96
151%97
152%98
153%99
154%9a
155%9b
156%9c
157%9d
158%9e
159%9f
160%a0
161%a1
162%a2
163%a3
164%a4
165%a5
166%a6
167%a7
168%a8
169%a9
170%aa
171%ab
172%ac
173%ad
174%ae
175%af
176%b0
177%b1
178%b2
179%b3
180%b4
181%b5
182%b6
183%b7
184%b8
185%b9
186%ba
187%bb
188%bc
189%bd
190%be
191%bf
192%c0
193%c1
194%c2
195%c3
196%c4
197%c5
198%v6
199%c7
200%c8
201%c9
202%ca
203%cb
204%cc
205%cd
206%ce
207%cf
208%d0
209%d1
210%d2
211%d3
212%d4
213%d5
214%d6
215%d7
216%d8
217%d9
218%da
219%db
220%dc
221%dd
222%de
223%df
224%e0
225%e1
226%e2
227%e3
228%e4
229%e5
230%e6
231%e7
232%e8
233%e9
234%ea
235%eb
236%ec
237%ed
238%ee
239%ef
240%f0
241%f1
242%f2
243%f3
244%f4
245%f5
246%f6
247%f7
248%f8
249%f9
250%fa
251%fb
252%fc
253%fd
254%fe
255%ff

c. HTML Reserved Encode Characters

These include all the special characters such as semicolon(;), dollar($), question mark(?). These characters have a different meaning in URLs and need to be encoded. For example, the ‘/’ character has a special meaning i.e. it is used to separate the paths of URL and at the same time, it is also a reserved character. It is encoded as %2F. Following is the list of reserved characters.

CHARACTERENCODED FORM
!%21
*%2A
%27
(%28
)%29
;%3B
:%3A
@%40
&%26
=%3D
+%2B
$%24
,%2C
/%2F
?%3F
#%23
[%5B
]%5D

d. HTML Safe Encode Characters

Alphanumeric characters i.e. 0-9, a-z,A-Z , special characters such as $, -, _, ., +, !, *, ‘, (, ), and reserved characters are not encoded and known as safe characters.

e. HTML Unsafe Encode Characters

These include space, greater than and less than signs, quotation marks, etc. They have a tendency to be misinterpreted in the URL and thus should be encoded properly. Following is the list of some unsafe characters-

CHARACTERENCODED FORM
space%20
%22
%3C
%3E
#%23
%%25
{%7B
}%7D
|%7C
\%5C
^%5E
~%7E
[%5B
]%5D

URL-Encoding for Control Characters

ASCII characters from %00-%1F were designed to control hardware devices.

ASCII CharacterDescriptionURL-encoding
NULNull Character%00
SOHStart of Header%01
STXStart of Text%02
ETXEnd of Text%03
EOTEnd of Transmission%04
ENQEnquiry%05
ACKAcknowledge%06
BELBell (Ring)%07
BSBackspace%08
HTHorizontal Tab%09
LFLine Feed%0A
VTVertical Tab%0B
FFForm Feed%0C
CRCarriage Return%0D
SOShift Out%0E
SIShift In%0F
DLEData Link Escape%10
DC1Device Control 1%11
DC2Device Control 2%12
DC3Device Control 3%13
DC4Device Control 4%14
NAKNegative Acknowledge%15
SYNSynchronize%16
ETBEnd Transmission Block%17
CANCancel%18
EMEnd of Medium%19
SUBSubstitute%1A
ESCEscape%1B
FSFile Separator%1C
GSGroup Separator%1D
RSRecord Separator%1E
USUnit Separator%1F

Summary

In this article, we’ve discussed the Uniform Resource Locator, which typically defines the address of a document on the web. We’ve discussed the process of encoding the character of the URL to the corresponding ASCII characters that are globally understood by the browsers on the internet. We’ve also looked at some common URL encoded characters in Windows-1252 and UTF-8 along with URL encoding of control characters.

Did you like this article? If Yes, please give DataFlair 5 Stars on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *