Wednesday, November 19, 2008

Flying Saucer, XHTML Rendering, and Local XML Entities

I have been using Flying Saucer's (FS) XHTML Renderer [1] for at least 5 months now. It's an excellent library for rendering XHTML for display in Java Swing (using FS's XHTMLPanel) and for converting XHTML content to PDF files (see Generating PDFs for Fun and Profit with Flying Saucer and iText [2]). In my latest project, I create a report in XHTML and use FS to create a PDF version as follows:

String baseUrl ... // root URL for resources
InputStream xhtml = ... // XHTML content
DocumentBuilder builder =
  DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(xhtml);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, baseUrl);
OutputStream os = new FileOutputStream("Out.pdf");
renderer.layout();
renderer.createPDF(os);


One benefit of using this method to create a PDF is that it is much easier to layout the report in XHTML than using a PDF library like iText directly. A second benefit is that I can render the report in two different formats (PDF, XHTML) and render it in a GUI with very little extra work.

Everything worked fine until yesterday.

The problem I ran into today was FS (more correctly the libraries it depends on) would randomly fail with errors such as
java.net.SocketException: Connection reset
and
java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

After help from the very responsive FS users group [3], the problem was that my XML document builder would sometimes fail when accessing the web to resolve XML entities, even though I was connected to the Internet. I was able to fix the problem by adding one line of code to configure the entity resolver to use XML entities on the class path (and conveniently packaged in FS's core-renderer.jar).

String baseUrl ...
InputStream xhtml = ...
DocumentBuilder builder =
  DocumentBuilderFactory.newInstance().newDocumentBuilder();
// Use FS's local cached XML entities so we don't
// have to hit the web.
builder.setEntityResolver(FSEntityResolver.instance());
Document doc = builder.parse(xhtml);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, baseUrl);
OutputStream os = new FileOutputStream("Out.pdf");
renderer.layout();
renderer.createPDF(os);


The moral of the story is that some thought should always be given to how you configure your XML parsers to resolve entities. In many cases it makes more sense to use a local store, not a default remote site such as www.w3.org. This is especially important if the application doing the parsing may run on computers not connected to the Internet. This applies to FS and other libraries that depend on parsing XML.

References

[1] The Flying Saucer Project, 2008
[2] Joshua Marinacci, Generating PDFs for Fun and Profit with Flying Saucer and iText, 2007
[3] Flying Saucer User Mailing List, 2008