URL Encoding in HTML – HTML URL Encode Characters
Free Web development courses with real-time projects Start Now!!
Welcome to DataFlair HTML Tutorial. We will learn about HTML URL Encode and encoding process along with encode characters.
HTML URL Encoding
Uniform Resource Locator or URL is used as the address of a document on the web. It can be composed of words, typically the Domain Name Server(DNS), or IP address. For example, https://data-flair.training/blogs/ is a URL.
The structure of this URL is as follows-
scheme://prefix.domain:port/path/filename
Here:
Scheme – Defines the internet service type, commonly http or https.
Prefix – Defines the domain prefix, www.
Domain – It defines the domain name of the internet, data-flair.training
Port – Defines the host’s port number, 80 is the default port number for http.
Path – Defines the path at the server.
Filename – It defines the name of the file or the document that is being displayed.
HTML URL Schemes
Some common URL schemes are-
- Http(HyperText Transfer Protocol) – Used for common web-pages. It is not encrypted.
- Https(Secure HyperText Transfer Protocol) – Used for secure web-pages. It is encrypted.
- ftp(File Transfer Protocol) – Used for downloading and uploading files.
HTML URL Encode Characters
URL encoding is the practice of translating characters within URL to ASCII so that they can be easily transmitted and get accepted by all the browsers present globally on the internet. The non-ASCII characters are shown with a percentage sign (%) followed by hexadecimal digits.
Hence, URL encoding basically involves replacing a character that does not start with ‘%’ followed by hexadecimal digits to the ASCII character set. For example, if you want to type a space in the URL, you write it as %20. $ is replaced by %24.
http://www.example.com/new%20article.htm
We would interpret this as ‘new article.htm’
The browser encodes the input as per the character set used in the document. The character set used in HTML5 is UTF-8.
The character that is encoded are:
a. HTML ASCII Control Characters
The characters used for output control. Typically from 00-1F(0-31 in decimal) and 7F(127 in decimal).
ASCII Encoding Example:
Character | From Windows-1252 | From UTF-8 |
€ | %80 | %E2%82%AC |
£ | %A3 | %C2%A3 |
© | %A9 | %C2%A9 |
® | %AE | %C2%AE |
À | %C0 | %C3%80 |
Á | %C1 | %C3%81 |
 | %C2 | %C3%82 |
à | %C3 | %C3%83 |
Ä | %C4 | %C3%84 |
Å | %C5 | %C3%85 |
For entire URL encoding, please visit https://www.eso.org/~ndelmott/url_encode.html
b. HTML Non-ASCII Characters
The characters beyond the ASCII characters i.e., beyond 128 characters. Following is the list of Non-ASCII url encoding-
CHARACTER | ENCODED FORM |
128 | %80 |
129 | %81 |
130 | %82 |
131 | %83 |
132 | %84 |
133 | %85 |
134 | %86 |
135 | %87 |
136 | %88 |
137 | %89 |
138 | %8a |
139 | %8b |
140 | %8c |
141 | %8d |
142 | %8e |
143 | %8f |
144 | %90 |
145 | %91 |
146 | %92 |
147 | %93 |
148 | %94 |
149 | %95 |
150 | %96 |
151 | %97 |
152 | %98 |
153 | %99 |
154 | %9a |
155 | %9b |
156 | %9c |
157 | %9d |
158 | %9e |
159 | %9f |
160 | %a0 |
161 | %a1 |
162 | %a2 |
163 | %a3 |
164 | %a4 |
165 | %a5 |
166 | %a6 |
167 | %a7 |
168 | %a8 |
169 | %a9 |
170 | %aa |
171 | %ab |
172 | %ac |
173 | %ad |
174 | %ae |
175 | %af |
176 | %b0 |
177 | %b1 |
178 | %b2 |
179 | %b3 |
180 | %b4 |
181 | %b5 |
182 | %b6 |
183 | %b7 |
184 | %b8 |
185 | %b9 |
186 | %ba |
187 | %bb |
188 | %bc |
189 | %bd |
190 | %be |
191 | %bf |
192 | %c0 |
193 | %c1 |
194 | %c2 |
195 | %c3 |
196 | %c4 |
197 | %c5 |
198 | %v6 |
199 | %c7 |
200 | %c8 |
201 | %c9 |
202 | %ca |
203 | %cb |
204 | %cc |
205 | %cd |
206 | %ce |
207 | %cf |
208 | %d0 |
209 | %d1 |
210 | %d2 |
211 | %d3 |
212 | %d4 |
213 | %d5 |
214 | %d6 |
215 | %d7 |
216 | %d8 |
217 | %d9 |
218 | %da |
219 | %db |
220 | %dc |
221 | %dd |
222 | %de |
223 | %df |
224 | %e0 |
225 | %e1 |
226 | %e2 |
227 | %e3 |
228 | %e4 |
229 | %e5 |
230 | %e6 |
231 | %e7 |
232 | %e8 |
233 | %e9 |
234 | %ea |
235 | %eb |
236 | %ec |
237 | %ed |
238 | %ee |
239 | %ef |
240 | %f0 |
241 | %f1 |
242 | %f2 |
243 | %f3 |
244 | %f4 |
245 | %f5 |
246 | %f6 |
247 | %f7 |
248 | %f8 |
249 | %f9 |
250 | %fa |
251 | %fb |
252 | %fc |
253 | %fd |
254 | %fe |
255 | %ff |
c. HTML Reserved Encode Characters
These include all the special characters such as semicolon(;), dollar($), question mark(?). These characters have a different meaning in URLs and need to be encoded. For example, the ‘/’ character has a special meaning i.e. it is used to separate the paths of URL and at the same time, it is also a reserved character. It is encoded as %2F. Following is the list of reserved characters.
CHARACTER | ENCODED FORM |
! | %21 |
* | %2A |
‘ | %27 |
( | %28 |
) | %29 |
; | %3B |
: | %3A |
@ | %40 |
& | %26 |
= | %3D |
+ | %2B |
$ | %24 |
, | %2C |
/ | %2F |
? | %3F |
# | %23 |
[ | %5B |
] | %5D |
d. HTML Safe Encode Characters
Alphanumeric characters i.e. 0-9, a-z,A-Z , special characters such as $, -, _, ., +, !, *, ‘, (, ), and reserved characters are not encoded and known as safe characters.
e. HTML Unsafe Encode Characters
These include space, greater than and less than signs, quotation marks, etc. They have a tendency to be misinterpreted in the URL and thus should be encoded properly. Following is the list of some unsafe characters-
CHARACTER | ENCODED FORM |
space | %20 |
“ | %22 |
< | %3C |
> | %3E |
# | %23 |
% | %25 |
{ | %7B |
} | %7D |
| | %7C |
\ | %5C |
^ | %5E |
~ | %7E |
[ | %5B |
] | %5D |
URL-Encoding for Control Characters
ASCII characters from %00-%1F were designed to control hardware devices.
ASCII Character | Description | URL-encoding |
NUL | Null Character | %00 |
SOH | Start of Header | %01 |
STX | Start of Text | %02 |
ETX | End of Text | %03 |
EOT | End of Transmission | %04 |
ENQ | Enquiry | %05 |
ACK | Acknowledge | %06 |
BEL | Bell (Ring) | %07 |
BS | Backspace | %08 |
HT | Horizontal Tab | %09 |
LF | Line Feed | %0A |
VT | Vertical Tab | %0B |
FF | Form Feed | %0C |
CR | Carriage Return | %0D |
SO | Shift Out | %0E |
SI | Shift In | %0F |
DLE | Data Link Escape | %10 |
DC1 | Device Control 1 | %11 |
DC2 | Device Control 2 | %12 |
DC3 | Device Control 3 | %13 |
DC4 | Device Control 4 | %14 |
NAK | Negative Acknowledge | %15 |
SYN | Synchronize | %16 |
ETB | End Transmission Block | %17 |
CAN | Cancel | %18 |
EM | End of Medium | %19 |
SUB | Substitute | %1A |
ESC | Escape | %1B |
FS | File Separator | %1C |
GS | Group Separator | %1D |
RS | Record Separator | %1E |
US | Unit Separator | %1F |
URL encoding proves to be extremely valuable in the ability to preserve web addresses and their general compatibility with any platform or browser out there. URL encoding eliminates problems caused by characters in the alphabets or other symbols that are not standard in ASCII by translating them into an acceptable format. This process also contributes to the effectiveness in delivering information and at the same time increases the stability and safety of web transactions.
Summary
In this article, we’ve discussed the Uniform Resource Locator, which typically defines the address of a document on the web. We’ve discussed the process of encoding the character of the URL to the corresponding ASCII characters that are globally understood by the browsers on the internet. We’ve also looked at some common URL encoded characters in Windows-1252 and UTF-8 along with URL encoding of control characters.
Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google