HTML URL Encoding

URL-encoding, also known as Percent-encoding is a process of encoding URL information so that it can be safely transmitted over the Internet.

What is URL Encoding

According to RFC 3986, the characters in a URL only limited to a defined set of reserved and unreserved US-ASCII characters. Any other characters are not allowed in a URL. But URL often contains characters outside the US-ASCII character set. They must be converted to a valid US-ASCII format for worldwide interoperability.

To map the wide range of characters that is used worldwide, a two-step process is used:

  • At first the data is encoded according to the UTF-8 character encoding.
  • Then only those bytes that do not correspond to characters in the unreserved set should be percent-encoded like %HH, where HH is the hexadecimal value of the byte.

For example, the string: François would be encoded as: Fran%C3%A7ois

Ç, ç (c-cedilla) is a Latin script letter.


Reserved Characters

Certain characters are "reserved" because they may (or may not) be defined as delimiters by the generic syntax in a particular URL scheme. For example, forward slash / characters are used to separate different parts of a URL.

If data for a URL component contains character that would conflict with a reserved set of characters, which is defined as a delimiter in the URL scheme then the conflicting character must be percent-encoded before the URL is formed. Reserved characters in a URL are:

! # $ & ' ( ) * + , / : ; = ? @ [ ]
%21 %23 %24 %26 %27 %28 %29 %2A %2B %2C %2F %3A %3B %3D %3F %40 %5B %5D

Unreserved Characters

Characters that are allowed in a URL but do not have a reserved purpose are called unreserved. These include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde. The following table lists all the unreserved characters in a URL:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 - _ . ~

URL Encoding Converter

The following converter encodes and decodes the characters according to RFC 3986.

 

Enter some character and click on encode or decode button to see the output.

 
Close

Your Feedback:

 

We would love to hear from you! Please say something.