URL, URL, Little Do We Know Thee 
	By
	
		 Razvan Peteanu  
 
 By Razvan Peteanu for SecurityPortal 
 
About Schemes and Men
Recently, many smiled and Microsoft got angry at a spoof of its Knowledge Base 
  articles posted on a URL starting with "http://www.microsoft.com." 
  Emails went around and people clicked on the link, possibly before looking closer at it. Surprised by the content, they may have checked the URL again, noticing the other "www"-like string in it and figured out it must have something to do with the real host; forwarded the email to friends and then returned to their work. 
Today we will look closer at URLs and the associated security implications. 
  "Interesting" ways of using them have been known by spammers for a 
  while, but now the KB spoof and the February issue of 
  Crypto-Gram have made the Internet community more aware of what URLs can do. 
 
Although most Internet users will associate URLs with WWW addresses, or perhaps 
  FTP, Uniform Resource Locators are more general in scope. URLs are standardized 
  in RFC1738, and in their most generic form, they are defined 
  as  
<scheme>:<scheme-specific-part> 
The best-known scheme is the Common Internet, in which the <scheme>
  is the name of a protocol and the <scheme-specific-part> is defined as: 
//<user>:<password>@<host>:<port>/<url-path> 
in which only the host part is mandatory. The ":" and "@" 
  characters have a special meaning and thus the server can parse the entire string. 
  If a user and a password are provided, the host part only comes after the @ 
  character. In the KB spoof mentioned earlier, the link was  
http://www.microsoft.com&item=q209354@www.hwnd.net/pub/mskb/Q209354.asp 
Understandably, it is no longer available. (In case you find a copy elsewhere, 
  be aware that the page uses strong language and might trigger some content scanners 
  as well.) As you have guessed, the real host of the page was www.hwnd.net. The 
  string "www.microsoft.com" in this case is just a bogus username that is ignored 
  by the web server. 
Although perfectly valid syntactically, the above usage can be considered as 
  having security relevance. While no technological resource is affected, the 
  attack is targeted at the other (and often ignored) half of the picture: ourselves. 
  At the end of most Internet nodes, beyond network cards, modems and computers, 
  there are human users who, consciously or not, make security decisions every 
  time they decide to trust what they see on the screen.  
Trust is a fundamental 
  security value. Crafting the URL as above exploits the trust we have in our 
  understanding of what a URL is like and in whoever provided us the link. It 
  also exploits the fact that our attention is focused on the content frame and 
  not on the location although they are equally important in a decision of trust. 
  In SSL-protected sites, the latter is in part taken care of by the browser, which 
  compares the domain with the information in the SSL certificate; otherwise mere 
  encryption would not provide much value if the destination is bogus. 
Concealment
The URL analyzed above is just superficially hiding its real destination. Let 
  us look further into better ways of doing this. For some reasons (probably caused 
  by the internal handling), some operating systems operate with IP addresses not 
  only in the form we are used to, aaa.bbb.ccc.ddd, but also as the decimal equivalent.  
  The above generic address can also be written as the decimal value of aaa*256^3+bbb*256^2+cccc*256+ddd. 
  Thus, 3633633987 is 216.148.218.195 (belonging to www.redhat.com). You can copy 
  and paste 3633633987 into your browser, and you will find yourself browsing Red Hat's 
  main site. The above works with Internet Explorer 5.x and also with Lynx on 
  Linux, but I have not tested all operating systems, so your mileage may vary. 
  Some applications may complain of invalid URLs if they parse the domain name 
  for periods, but if you experiment with a few applications, including standard 
  utilities like ping, you should be able to figure out whether the OS itself 
  supports this usage. 
Thus more obfuscation could be obtained by creating a URL such as http://www.toronto.com:ontario@3633633987 
  which still goes to Red Hat. Surfers are used to seeing strings of digits in 
  a URL because many sites store the HTTP SessionID in the URL instead of in a 
  cookie, so the above would not appear particularly suspicious. The password 
  can be absent, so we end up having http://www.toronto.com@3633633987, 
  "easy to read, easy to misunderstand" at a first glance. 
Now, for the final touch, we can use a bit of HTML knowledge: the anchor tag 
  allows the display text for a link to be different than the target itself, so 
  the above link can appear as http://www.toronto.com. 
  In IE 5.5, hovering with the mouse over it displays the number only in the status 
  bar, not very indicative of a wrong target, so only clicking on it would show 
  us the real target.  
Yet another way of exploiting trust is by using the indirection provided by 
  genuine websites. A number of well-known sites track if their visitors follow 
  external links by first creating the links of the form http://www.thisisarespectablesite.com/outsidelinks/http://externalsite, 
  trapping the request at the server side and then redirecting the user to the 
  real destination.  
The problem with this approach is that anyone can use their 
  indirection, combined with URL obfuscation, in order to provide more legitimacy 
  to false URLs. What this can lead to depends both on the attacker and on the 
  victim. The HTTP REFERER field, limited as it is, can be of some value to reduce 
  abuses, but not all sites seem to use it. 
And if the above was not enough, the characters in the real destination can 
  be obfuscated themselves through URL and Unicode encoding. so only 
  the hex codes will be visible. URL encoding is required for many special characters, 
  but can be applied to regular alphanumeric characters as well.  
None of the above is new to knowledgeable spammers, but will likely be quite 
  successful as an attack targeted to the average unsuspecting user. 
One-click Attacks
Let's explore the security implications of the URL even further. One of the "standard" 
  attacks would be to cause a buffer overflow. As far as the browsers go, however, 
  by now this would be a very beaten path; many a hacker has tried to crash IE 
  or Netscape. What about other protocols? Indeed, what other protocols 
  are recognized on a machine? 
 To find out the answer for a Windows box, I turned to looking into the registry. 
  The following keys contain such information: HKEY_LOCAL_MACHINESOFTWAREClassesPROTOCOLSHandler 
  and those keys under HKEY_CLASSES_ROOTShell that have a subkey named "URL 
  Protocol." (You will have to do some searching for those in the latter 
  category, but it does not take long.)  
The search results proved interesting: 
  apart from the expected ftp://, http://, https://, mailto://, news://, pnm:// 
  and several others, I found some schemes I had never heard of before, such as 
  msee://. A quick experiment showed that it is the scheme used by Microsoft 
  Encarta, perhaps to refer to articles inside the encyclopedia. Whether Encarta 
  is safe from buffer overflows and, if not, whether they can be practically exploited, 
  well, this is something that would need investigation. 
 The story repeated with other URL schemes that were installed by various applications 
  (such as copernic:// owned by the Copernic 
  search tool). There have been other interesting discoveries, but have a look 
  for yourself. 
  Apart from the possibility of remote exploitation of applications that are 
  not otherwise remotely accessible, even more discomfort is caused by the absence 
  of any administrative interface allowing inspection of the associations between 
  a URL scheme and the application using it (apart from a very scope-limited dialog 
  in Internet Explorer under Tools/Options/Programs which only displays a handful 
  of standard protocols).  
It turns out that registering a new URL scheme in Windows is trivial and the 
  change takes place immediately. It is done by adding the necessary registry 
  entries as described in this MSDN documentation. 
  Unfortunately, this also means this can be done by scripted viruses such as KakWorm 
  (which are executed by simply viewing an email on a vulnerable system).  
Associating 
  a benign protocol with a dangerous command is, well, dangerous. Granted, this 
  is not a URL-specific attack. It can be done using file associating as well, 
  but the risk is still there, and the existence of other attack paths does not 
  mean this one will not be exploited. And, of course, nothing forces an attacker 
  to use only the techniques described here.  
Until there are more mechanisms to inform and protect us from such attacks, 
  the best defense is to be cautious, and do not follow directions in emails you 
  cannot trust. Sometimes, you just feel something isn't right. 
Now, if you would only click this link for 
  some free advice :-)  ... Did you ? 
 
References:
Bruce Schneier, Crypto-Gram, Feb 2001, "A Semantic 
  Attack on URLs" 
   http://www.counterpane.com/crypto-gram-0102.html#7 
RFC1738 
  http://www.securityportal.com/rfc/rfc1738.txt 
 MSDN, "Registering an Application to a URL Protocol" 
   http://msdn.microsoft.com/workshop/networking/plugga 
  ble/overview/appendix_a.asp 
 
SecurityPortal is the world's foremost on-line resource and services
provider for companies and individuals concerned about protecting their
information systems and networks. 
http://www.SecurityPortal.com 
The Focal Point for Security on the Net (tm) 
 
 |