From: Zag Zig [ZagZig@BIGFOOT.COM] Sent: Friday, October 06, 2000 5:50 PM To: BUGTRAQ@SECURITYFOCUS.COM Subject: Cross site scripting: a long term fix Cross site scripting: a long term fix I recently came across this Bugtraq thread sparked by the CERT warning about cross site scripting vulnerability in processing dynamically generated HTML that echoes the text entered by potentially malicious visitors to the site. http://www.securityportal.com/list-archive/bugtraq/2000/Feb/0078.html It appears to me that there is a relatively painless long term way to fix this problem that I have not seen discussed. If it was mentioned elsewhere, I would like to know about it. Here is my idea about how this should be fixed in the long run. I hope somebody will try to shoot it down. I distinguish between short term and long term fixes. The short term fixes are relatively painful to use and will be used only by the most alert and diligent web designers. These fixes do not require any changes in browser and server software. The long term fixes require changes in browser or server software. They should be fairly painless to for the web content providers. I start with the review of the relevant material I found on the web. 1.1. Press An interesting commentary on this issue could be found in 'The Cross-Site Scripting Scam' by John C Dvorak. http://www.zdnet.com/pcmag/stories/opinions/0,7802,2434175,00.html This page has a list of links to comments entered by the readers. It appears that one of the commenting readers successfully illustrated the problem on that page. 1.2. CERT The CERT Advisory CA-2000-02 identifies the problem all right. http://www.cert.org/advisories/CA-2000-02.html It also proposes a short term fix that web designers can use right away without any changes to browser and server software. http://www.cert.org/tech_tips/malicious_code_mitigation.html This short term fix is complex and not likely to be widely used. The report does not propose any other changes in the web architecture that could lead to a simpler, more secure, and more widely used solution. It does not properly characterize the problem. It does not examine which features of the web architecture are responsible for the existence of the problem. It makes the problem look way too complex. This is a very simple problem. They do not expose the simplicity of the problem and do not propose a solution of matching simplicity. 1.3. W3C W3C has the CERT report posted on their web site, but I could find no other information about this problem. http://lists.w3.org/Archives/Public/w3c-wai-ig/2000JanMar/0302.html 1.4. Microsoft Microsoft explains why this problem cannot be fixed in the web browser software nor in the web server software. Designers of web pages with dynamic content must be aware of this problem and do something to avoid it. Although this is correct, it does not mean that browsers and servers could not give web designers better tools and procedures for avoiding this problem. http://www.microsoft.com/technet/security/crsstFAQ.asp Both sources suggest that the only solution is to filter the dynamically generated portion of HTML on input and/or on output. This is probably the only solution with the current state of the browser software and the current HTML standard. Microsoft suggests filtering the following special characters: ' < > ) ( & + % ; " http://www.microsoft.com/technet/security/CSOverv.asp I am all too familiar with this solution in web forums. How did I discover it? By posting plain text with one of those innocent looking characters, then finding that those characters were missing in the formatted text, often together with other text that followed them. It happened to me often when posting a long URL that uses percent sign followed by the numeric value of a character. Applications that expect or require HTML input, such as web forums, should be aware of HTML security problems. Even for them, character filtering is not a good solution. Most web programmers do not expect to find HTML or a script in simple text input fields and they should not be asked to check for it. Trying to solve this problem by filtering of 'special characters' on input or output is not the right way to do it. I do not see anything special about any of those characters. This will make the web more complex, not more reliable. 1.5. Wrox Another solution is presented in a Wrox article by James Brannan: Protecting Yourself, Your Site, and Your Clients from Cross-Site Scripting Attacks. http://www.asptoday.com/articles/20000525.htm This applies to Active Server Pages server side scripting. He suggests escaping dynamic HTML before sending it to a browser, using the HTMLEncode method on the server. This effectively quotes the HTML tags, resulting in the markup being displayed, not acted upon. He also writes about input and output character filtering, but I think this is not needed when HTMLEncode is used. 1.6. Proposal to add a safe quoting tag to HTML The HTMLEncode solution above is better than filtering. I propose that a solution for quoting markup should be built into the HTML specification and therefore made available to all servers for use with both static and dynamically generated text. The cross site scripting problem is difficult only as long as HTML writers do not have a simple and reliable tool to prevent it. That tool is missing in HTML along with a basic concept. There is no way to safely quote text containing markup. Markup is interpreted even inside the
'pre formatted' text block. A simple solution for this problem is to add a new HTML tag which will process all characters literally, for example:
for static text containing HTML tags meant to be viewed, not interpreted. To make it even more useful we could add attributes ON and OFF to list the tags that must or must not be interpreted within this block. Often links are the only reason I want to use HTML instead of plain text. This would give me plain text with anchors:Perhaps this should be controlled from style sheets linking to the id or class attribute of this tag. ###