Mutant World

Thursday, September 20, 2007

JSP page encoding

I recently had a problem with a web application using JSP pages that was not handling non-ASCII characters correctly: submitted text containing characters such as ù or è resulted in garbage characters stored in the database and consequently garbage displayed by the browser.

The application was developed in Linux, using an IDE configured to save files using UTF-8 encoding, and deployed on a Linux box.
The database was configured to use UTF-8 encoding.
Every page contained the directive <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> in the <head> section.

I thought that given these premises, there was no chance of misbehaviors with respect to character encoding, but I was wrong.
The browser kept telling me that the page it was displaying had ISO-8859-1 encoding.

It turned out that the JSP specification says that if the page encoding of the JSP pages is not explicitely declared, then ISO-8859-1 should be used (!).
The Jetty servlet container was correctly setting the HTTP header as: Content-Type: text/html; charset=ISO-8859-1, following the specification.

The fix is simple, just add this to web.xml:

<jsp-config>
<jsp-property-group>
<url-pattern>*.jsp</url-pattern>
<page-encoding>UTF-8</page-encoding>
</jsp-property-group>
</jsp-config>

It will be interesting to know why the JSP expert group did not pick up UTF-8 as the default character encoding.