Pretty Printing XML in Python

(Originally posted 2011-06-22.)

A need has arisen for pretty printing XML. This post includes some Python code to do it.

I’ve been working with the ODP presentation file format recently: I want to generate presentations from application code.

An ODP file is a structured zip file. Among other items it contains an XML file – content.xml – which describes the pages in the presentation. content.xml concatenates everything as one long line of text. It’s therefore very hard to read – without specialist XML parsing tools. And I’d like to – so I can teach my code to navigate and update it.

So the following is a short Python program to pretty print an XML file. It takes the XML file as standard input (stdin) and writes the formatted XML to standard output (stdout). If it can’t parse the XML it writes an appropriate error message to standard error (stderr).

While this isn’t a Python tutorial I’d like to outline how it works:

  • I’m using the built-in xml.dom.minidom package to parse the XML and to pretty print it.
  • I’m using the built in re regular expression package to remove trailing spaces on a line that minidom’s toprettyxml() seems to create.
  • I’m using standard Python replace() to do some other substitutions to further clean up the XML.
  • If the read-in XML isn’t valid minidom throws a ExpatError. In fact it uses xml.parsers.expat and this is what throws the error.

I’ll admit to a little disappointment that toprettyxml() doesn’t create quite as pretty an output file as I’d like. Hence the post-processing. Feel free to experiment with different values for the parameters to it. Also you might not feel the need for some of the post-processing.

Here’s the code. It’s more white space, comments and error handling than actual straight-through processing.

from xml.dom import minidom
from xml.parsers.expat import ExpatError
import sys,re

# Edit the following to control pretty printing
indent="  "

# Regular expression to find trailing spaces before a newline
trails=re.compile(' *\n')

  # Parse the XML - from stdin
  # First-pass Pretty Print of the XML
  # Further clean ups
  # Write XML to stdout

except ExpatError as (expatError):
  sys.stderr.write("Bad XML: line "+str(expatError.lineno)+" offset "+str(expatError.offset)+"\n")

One question: Can this be run on z/OS? I believe it can: minidom is a core part of Jython and Jython certainly runs under z/OS. In fact, being a java jar I’d expect it to run on a zAAP (or zIIP with zAAP-on-zIIP). If you try it and get it working let us know.

Published by Martin Packer

I'm a mainframe performance guy and have been for the past 35 years. But I play with lots of other technologies as well.

One thought on “Pretty Printing XML in Python

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: