URL Cryptology 101

To decode a URL you need to understand the following basic parts that make up the whole string. We’ll use one from a previous post.

 

 

 

  • URL:
    Uniform Resource Locator. The URL defines the location of the site you are viewing.
  • Prefix:
    The Prefix defines what Protocol is being called. A Protocol is a standardized means of communicating between computers across a network. http is HperText Transfer Protocol. https is a secure, or encrypted HyperText Tranfser Protocol. As a general rule, you should avoid entering any confidential information on a site that is not using https as the transfer of that data will not be encrypted or secure. Always look for the https in the address bar and the padlock on bottom of your browser. ftp is File Transfer Protocol. And, finally, news is used when using your browser to view newsgroups.
    The Prefix is separated from the rest of the URL with a colon and two forward slashes (://)
  • Address:
    The address has several components. It is found in the URL between the "://" and the next "/". If you see WWW that is simply a disignator and it means World Wide Web. The use of www at the beginning of a web address is conventional, but not mandatory. WWW may be replaced by, or followed with, any.number.of.strings.separated.with.dots. In normal server hierarchy this can actually tell you quite a bit. Look at this URL: http://notes.cc.sunysb.edu. This address is telling me that I’m going to the server known as "notes" and that this server is controled by "cc" the Computing Center, which is part of Stony Brook (sunysb) and that Stony Brook is an Educational Institution (edu). This is not always the case, as can be found in many phishing emails. The most important part of the address is the last few items before first single slash "/" separated by dots (www.amazon.com, mypage.ebay.com, www.what.ever.I.feel.like.typing.domain.ext, www.amazon.com.ca) The address ends with the domain name (amazon, ebay, paypal, citibank, stonybrook) and the extension (com, net, org, edu, info, biz. Sites outside of the US often use a country designator after the extension.
    This is a legitimate URL for eBay: http://crafts.listings.ebay.com/. This is NOT legitimate: http://www.ebay.com.iam.a.scam.artist.com.ru/. This URL actually points to a domain called "artist.com" and is hosted in Russia (.ru).
    Domains can also be referred to using the numeric equivalent of their address. Every computer that connects to the Internet has an address called an IP address or Internet Protocol address. Some machines can have a fixed IP address, meaning that the number will not be changed. Others will use DHCP or Dynamic assignment which assigns a temporary number to a computer on demand and can recycle the number over a large computer collection, but never at the same time.
  • Directories:
    URLs may or may not have any number of directories. On conventionally configured sites you could expect to see "domain.ext/images/", "domain.ext/css" or "domain.ext/scripts" These would typically hold images, Cascading Style sheets or scripts, respectively. To disguise a spoofed site you may see several random directories tossed in to add to the confusion.
  • Files:
    The very last section of a URL will contain the name of the page you are looking at. This page, or file, name may or may not include a file type as in "index.html". Frequently seen file types are: .html, .shtml, .php, .asp, etc… WARNING! If you see a ".exe" ".scr" or ".pif" at the end of a link sent to you in an email do NOT ever click on that link! These are executable files that could very well infect your computer and/or render it a boat anchor.

In the above example you will notice "/%20%20/". This can be another ploy used by the scammer to conceal the actual site. Any character that can be seen/typed on a computer is schizophrenic, if you will. Characters have "Alter Egos" or other forms of representation. Computers like numbers and prefer numeric representations rather than text. One of these numeric representations is known as "hexadecimal". The hexadecimal representation of the character we know as space: " " has a hexadecimal equivalent of 20. It has another alter ego of "32" in the ISO-Latin character set that we use. When converted to hexadecimal, 32 becomes 20. And, in URL encoding 20 is referred to as "%20".

 Let’s look at another example:

This: %70%68%69%73%68%2e%73%63%61%6d%2e%63%6f%6d

is the same as: phish.scam.com 

 Notice that "%2e" is repeated; those are periods ".". "%6d" is also repeated; these are the "m"s in "scam" and "com". So, if you had a URL that looked like: http://%70%68%69%73%68%2e%73%63%61%6d%2e%63%6f%6d it would take you to a site called http://phish.scam.com

Here is a much larger table of characters and their alter egos. If you look at the space (first column in the fifth row) "SP" you will see "0×20". This is the hexadecimal equivalent and if you change the "0x" with "%" you have URL encoded it. Think of it as a puzzle or secret language.

You need to URL encode characters other than {a-z A-Z 0-9] in URLs because some characters are "reserved" or "special" and can have a specific meaning or action, so computers know how to decipher these codes info meaningful website addresses and actions. Scammers encode special characters to hide the function the URL is performing.

The more confusing a URL appears the less likely average computer users will see through the screen of deception!

Post a Comment

Your email is never published nor shared. Required fields are marked *