From: Zag Zig [ZagZig@BIGFOOT.COM] Sent: Friday, October 06, 2000 5:50 PM To: BUGTRAQ@SECURITYFOCUS.COM Subject: Cross site scripting: a long term fix Cross site scripting: a long term fix I recently came across this Bugtraq thread sparked by the CERT warning about cross site scripting vulnerability in processing dynamically generated HTML that echoes the text entered by potentially malicious visitors to the site. http://www.securityportal.com/list-archive/bugtraq/2000/Feb/0078.html It appears to me that there is a relatively painless long term way to fix this problem that I have not seen discussed. If it was mentioned elsewhere, I would like to know about it. Here is my idea about how this should be fixed in the long run. I hope somebody will try to shoot it down. I distinguish between short term and long term fixes. The short term fixes are relatively painful to use and will be used only by the most alert and diligent web designers. These fixes do not require any changes in browser and server software. The long term fixes require changes in browser or server software. They should be fairly painless to for the web content providers. I start with the review of the relevant material I found on the web. 1.1. Press An interesting commentary on this issue could be found in 'The Cross-Site Scripting Scam' by John C Dvorak. http://www.zdnet.com/pcmag/stories/opinions/0,7802,2434175,00.html This page has a list of links to comments entered by the readers. It appears that one of the commenting readers successfully illustrated the problem on that page. 1.2. CERT The CERT Advisory CA-2000-02 identifies the problem all right. http://www.cert.org/advisories/CA-2000-02.html It also proposes a short term fix that web designers can use right away without any changes to browser and server software. http://www.cert.org/tech_tips/malicious_code_mitigation.html This short term fix is complex and not likely to be widely used. The report does not propose any other changes in the web architecture that could lead to a simpler, more secure, and more widely used solution. It does not properly characterize the problem. It does not examine which features of the web architecture are responsible for the existence of the problem. It makes the problem look way too complex. This is a very simple problem. They do not expose the simplicity of the problem and do not propose a solution of matching simplicity. 1.3. W3C W3C has the CERT report posted on their web site, but I could find no other information about this problem. http://lists.w3.org/Archives/Public/w3c-wai-ig/2000JanMar/0302.html 1.4. Microsoft Microsoft explains why this problem cannot be fixed in the web browser software nor in the web server software. Designers of web pages with dynamic content must be aware of this problem and do something to avoid it. Although this is correct, it does not mean that browsers and servers could not give web designers better tools and procedures for avoiding this problem. http://www.microsoft.com/technet/security/crsstFAQ.asp Both sources suggest that the only solution is to filter the dynamically generated portion of HTML on input and/or on output. This is probably the only solution with the current state of the browser software and the current HTML standard. Microsoft suggests filtering the following special characters: ' < > ) ( & + % ; " http://www.microsoft.com/technet/security/CSOverv.asp I am all too familiar with this solution in web forums. How did I discover it? By posting plain text with one of those innocent looking characters, then finding that those characters were missing in the formatted text, often together with other text that followed them. It happened to me often when posting a long URL that uses percent sign followed by the numeric value of a character. Applications that expect or require HTML input, such as web forums, should be aware of HTML security problems. Even for them, character filtering is not a good solution. Most web programmers do not expect to find HTML or a script in simple text input fields and they should not be asked to check for it. Trying to solve this problem by filtering of 'special characters' on input or output is not the right way to do it. I do not see anything special about any of those characters. This will make the web more complex, not more reliable. 1.5. Wrox Another solution is presented in a Wrox article by James Brannan: Protecting Yourself, Your Site, and Your Clients from Cross-Site Scripting Attacks. http://www.asptoday.com/articles/20000525.htm This applies to Active Server Pages server side scripting. He suggests escaping dynamic HTML before sending it to a browser, using the HTMLEncode method on the server. This effectively quotes the HTML tags, resulting in the markup being displayed, not acted upon. He also writes about input and output character filtering, but I think this is not needed when HTMLEncode is used. 1.6. Proposal to add a safe quoting tag to HTML The HTMLEncode solution above is better than filtering. I propose that a solution for quoting markup should be built into the HTML specification and therefore made available to all servers for use with both static and dynamically generated text. The cross site scripting problem is difficult only as long as HTML writers do not have a simple and reliable tool to prevent it. That tool is missing in HTML along with a basic concept. There is no way to safely quote text containing markup. Markup is interpreted even inside the
 
'pre formatted' text block. A simple solution for this problem is to add a new HTML tag which will process all characters literally, for example: . Then the server simply wraps the user input with this tag and makes any scripts harmless. If you want to publish HTML source as plain text, you can simply wrap it with this tag. This is a simplification. I will discuss the safety issues and the required syntax for this tag later in section 2. This tag should have been part of HTML from day one. I take this back, make it day zero. This tag, when applied to any text, returns that text unchanged. Zero, when added to any number, returns that number unchanged. In spite of this simplicity, it took a long time to discover or invent the number zero. Solving cross scripting problem with HTML lacking this zero tag is like multiplying with Roman numerals. Will adding this tag cause any problems? The possible problem is that it may delay some sexier features: adding smell, taste, touch, the sixth sense and the fourth dimension to the web. This is a no-op tag, it performs no operation. It should not be too difficult to implement it. It would be difficult to make incompatible implementations, but not impossible. 1.7. Can HTML quoting be made safe? If this was a talk, I would have expected someone to interrupt me a few paragraphs earlier where I suggested the simplified syntax. Surely you are joking. This can be defeated easily with this input: Together they give: This would be valid HTML and would introduce a script. 2. Syntax required to make HTML quoting safe. Making quoting safe is not difficult. To make quoting safe we need to add some attributes to the quoting tag. Our no-op program needs some parameters. It may even have to do some work. Recent programming languages have introduced many new ways to quote text strings. The two that I would use here are not so recent. 2.1. Adding an end marker to the opening and closing tag. ... The opening and the closing tag must be identified by the same string. Program sending this text to the browser could use the current time to to form this id. I would like this syntax for hand coded HTML. Similar syntax for quoting is available in many programming languages and dates back to at least the original Unix shell. It is known by a name that I am afraid to repeat here, for fear of offending a grammar checker. The name is 'here document'. 2.2. Adding the count of bytes in the text. ABC ABC This works even better when tags are generated by a program. Counting bytes is a cheap operation. This type of a quoted string is older then Fortran. Fortran borrowed it from the punched cards as the name 'Hollerith constant' suggests. What should the browser do if the number of bytes received does not match the number of bytes sent? It should throw away the string and replace it with a string of length zero. I also considered allowing the count to be deferred to the closing tag for long strings. ... This is easily defeated by the following input. Together they generate the following HTML which is mostly valid. This will fail on the second empty text after successfully introducing a script. 2.3. Additional functionality So far I have described the basic functionality needed to fix the cross site scripting problem. This tag is also useful as an alternative to
for static text containing HTML tags meant to be viewed,
not interpreted. To make it even more useful we could
add attributes ON and OFF to list the tags that
must or must not be interpreted within this block.

Often links are the only reason I want to use HTML
instead of plain text. This would give me plain text
with anchors:




Perhaps this should be controlled from style sheets
linking to the id or class attribute of this tag.




###