XML::Easy::ProceduralWriter

In this post I want to take a further look at writing code that outputs XML with XML::Easy, and how the example from my previous blog entry can be improved upon by using XML::Easy::ProceduralWriter.

What's wrong with the previous example?

When we left things in my previous post we were looking at this snippet of code that outputs a simple XML webpage:
tie my %hash, "Tie::IxHash",
  "http://search.cpan.org/" => "Search CPAN",
  "http://blog.twoshortplanks.com" => "Blog",
  "http://www.schlockmercenary.com" => "Schlock",
;

my $root_element = xe("html",
  xe("head",
    xe("title", "My Links"),
  ),
  xe("body",
    xe("h1", "Links"),
    xe("ul",
      map { xe("li", xe("a", { href => $_ }, $hash{$_}) ) } keys %hash
    ),
  ),
);

print {$fh} xml10_write_document($root_element);
The above code produces exactly the output we want, but it doesn't necessarily go about the best way of producing it. The first problem is that it's using Tie::IxHash to ensure that the keys of the hash (and thus the nodes in the resulting XML) come out in the right order rather than in a random order like traditional hashes. Tied data structures are much slower than normal data structures and using this structure in this way is a big performance hit. However in this case we have to tie because it's hard to write, in a readable way, the logic inline in the map statement to process a normal array two elements at a time. Which brings us onto the second problem, also related to the map statement - it's somewhat unwieldy to write and hard to read (you have to scan to the end of the line to work out that it's using the %hash for its keys.) This only gets worse as you have to produce more complex XML and you try and use further (possibly nested) map statements and tertiary logic expressions to build up even more complex data structures - which is every bit as messy to do as it is to explain. Both issues stem from trying to build the XML::Easy::Element tree all in one go, essentially in one statement as a single assignment. If we choose not to restrict ourselves in this way we can easily re-order the code to use a temporary variable and do away with both the tie and the map:
my @data = (
  "http://search.cpan.org/" => "Search CPAN",
  "http://blog.twoshortplanks.com" => "Blog",
  "http://www.schlockmercenary.com" => "Schlock",
);

my @links;
while (@data) {
  my $url = shift @data;
  my $text = shift @data;
  push @links, xe("li", xe("a", { href => $url }, $text) );
}

my $root_element = xe("html",
  xe("head",
    xe("title", "My Links"),
  ),
  xe("body",
    xe("h1", "Links"),
    xe("ul", @links),
  ),
);

print {$fh} xml10_write_document($root_element);
The problem with this solution is now we've ended up with code that's backwards. We're creating the list elements and then creating the node that encloses them. Now we have to read the bottom of the code to work out that we're creating a HTML document at all!

Introducing XML::Easy::ProceduralWriter

To solve this problem I wrote XML::Easy::ProceduralWriter, a module that allows you to write your code in a procedural fashion but without having to "code in reverse". Here's the above example re-written again, this time using XML::Easy::ProceduralWriter:
use XML::Easy::ProceduralWriter;

print {$fh} xml_bytes {

  element "html", contains {
  
    element "head", contains {
      element "title", contains {
        text "My Links";
      };
    };
    
    element "body", contains {
      element "ul", contains {
         my @data = (
           "http://search.cpan.org/" => "Search CPAN",
           "http://blog.twoshortplanks.com" => "Blog",
           "http://www.schlockmercenary.com" => "Schlock",
         );
         
         while (@data) {
           element "li", contains {
             element "a", href => shift @data, contains {
               text shift @data;
             };
           };
         }
      };
    };
  };

};
Using the module is straight forward. You start by calling either xml_element (which returns an XML::Easy::Element) or xml_bytes (which returns a set of bytes you can print out) and inside these call you pass some code that generates XML elements and text. Each element can 'contain' further code that produces sub-elements and text that element contains and so on. The key thing to notice is that unlike the previous examples where you were passing data structures into the functions here you're passing code to be executed. This means you can place arbitrary logic in what you pass in and you're not limited to single statements. For example, in the above code we declare variables in the middle of generating the XML. The conceptual jump here is realising that neither what the blocks of code nor what element and text return isn't important, but the side effects of calling these two functions are. The simplest way to think about it is to imagine the string being built up as the element and text statements are encountered in much the same way output is straight away printed to the filehandle when you use print (even though technically this isn't the case here - a full XML::Easy::Element object tree is always actually built in the background.) The documentation for XML::Easy::ProceduralWriter contains a reasonable tutorial that explains its usage in more detail, but it should be pretty straight forward from just reading the above code to jump straight in. And that's pretty much all I have to say about outputting XML with XML::Easy. In my next post we'll look instead at advanced parsing and how to cope with documents with XML Namespace declarations.

- to blog -

blog built using the cayman-theme by Jason Long. LICENSE