Mutant World

Thursday, September 20, 2007

JSP page encoding

I recently had a problem with a web application using JSP pages that was not handling non-ASCII characters correctly: submitted text containing characters such as ù or è resulted in garbage characters stored in the database and consequently garbage displayed by the browser.

The application was developed in Linux, using an IDE configured to save files using UTF-8 encoding, and deployed on a Linux box.
The database was configured to use UTF-8 encoding.
Every page contained the directive <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> in the <head> section.

I thought that given these premises, there was no chance of misbehaviors with respect to character encoding, but I was wrong.
The browser kept telling me that the page it was displaying had ISO-8859-1 encoding.

It turned out that the JSP specification says that if the page encoding of the JSP pages is not explicitely declared, then ISO-8859-1 should be used (!).
The Jetty servlet container was correctly setting the HTTP header as: Content-Type: text/html; charset=ISO-8859-1, following the specification.

The fix is simple, just add this to web.xml:

<jsp-config>
<jsp-property-group>
<url-pattern>*.jsp</url-pattern>
<page-encoding>UTF-8</page-encoding>
</jsp-property-group>
</jsp-config>

It will be interesting to know why the JSP expert group did not pick up UTF-8 as the default character encoding.

3 Comments:

  • I recently came across your blog and have been reading along. I thought I would leave my first comment. I don't know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.

    Betty


    http://www.my-foreclosures.info

    By Blogger donald, at 15 December, 2008 08:07  

  • The problem with the JSP spec it is very Euro/America centric. I noticed even the latest specification still says ISO-8859-1 instead of UTF-8. The XML specification says UTF-8 as the min and UTF-16 are the only two required items to be supported. This covers the vast majority of character sets.

    It would be nice of the Servlet spec took a wider world view.

    By Blogger David Carver, at 10 December, 2009 04:53  

  • You star, thanks. You just saved me going through all my JSPs and adding to each one to fix my UTF-8 character encoding issue <%@page contentType="text/html; charset=UTF-8" %>

    By Anonymous Anonymous, at 25 September, 2012 11:14  

Post a Comment

<< Home