CONTENTS

NAME

XML::SAX::Writer - SAX2 XML Writer

VERSION

version 0.57

SYNOPSIS

use XML::SAX::Writer;
use XML::SAX::SomeDriver;

my $w = XML::SAX::Writer->new;
my $d = XML::SAX::SomeDriver->new(Handler => $w);

$d->parse('some options...');

DESCRIPTION

Why yet another XML Writer ?

A new XML Writer was needed to match the SAX2 effort because quite naturally no existing writer understood SAX2. My first intention had been to start patching XML::Handler::YAWriter as it had previously been my favourite writer in the SAX1 world.

However the more I patched it the more I realised that what I thought was going to be a simple patch (mostly adding a few event handlers and changing the attribute syntax) was turning out to be a rewrite due to various ideas I'd been collecting along the way. Besides, I couldn't find a way to elegantly make it work with SAX2 without breaking the SAX1 compatibility which people are probably still using. There are of course ways to do that, but most require user interaction which is something I wanted to avoid.

So in the end there was a new writer. I think it's in fact better this way as it helps keep SAX1 and SAX2 separated.

METHODS

THE CONSUMER INTERFACE

XML::SAX::Writer can receive pluggable consumer objects that will be in charge of writing out what is formatted by this module. Setting a Consumer is done by setting the Output option to the object of your choice instead of to an array, scalar, or file handle as is more commonly done (internally those in fact map to Consumer classes and and simply available as options for your convenience).

If you don't understand this, don't worry. You don't need it most of the time.

That object can be from any class, but must have two methods in its API. It is also strongly recommended that it inherits from XML::SAX::Writer::ConsumerInterface so that it will not break if that interface evolves over time. There are examples at the end of XML::SAX::Writer's code.

The two methods that it needs to implement are:

Here's an example of a custom consumer. Note the extra $ signs in front of $self; the base class is optimized for the overwhelmingly common case where only one data member is required and $self is a reference to that data member.

package MyConsumer;

@ISA = qw( XML::SAX::Writer::ConsumerInterface );

use strict;

sub new {
    my $self = shift->SUPER::new( my $output );

    $$self = '';      # Note the extra '$'

    return $self;
}

sub output {
    my $self = shift;
    $$self .= uc shift;
}

sub get_output {
    my $self = shift;
    return $$self;
}

And here is one way to use it:

my $c = MyConsumer->new;
my $w = XML::SAX::Writer->new( Output => $c );

## ... send events to $w ...

print $c->get_output;

If you need to store more that one data member, pass in an array or hash reference:

my $self = shift->SUPER::new( {} );

and access it like:

sub output {
    my $self = shift;
    $$self->{Output} .= uc shift;
}

THE ENCODER INTERFACE

Encoders can be plugged in to allow one to use one's favourite encoder object. Presently there are two encoders: Encode and NullEncoder. They need to implement two methods, and may inherit from XML::SAX::Writer::NullConverter if they wish to

new FROM_ENCODING, TO_ENCODING

Creates a new Encoder. The arguments are the chosen encodings.

convert STRING

Converts that string and returns it.

Note that the return value of the convert method is not checked. Output may be truncated if a character couldn't be converted correctly. To avoid problems the encoder should take care encoding errors itself, for example by raising an exception.

CUSTOM OUTPUT

This module is generally used to write XML -- which it does most of the time -- but just like the rest of SAX it can be used as a generic framework to output data, the opposite of a non-XML SAX parser.

Of course there's only so much that one can abstract, so depending on your format this may or may not be useful. If it is, you'll need to know the following API (and probably to have a look inside XML::SAX::Writer::XML, the default Writer).

init

Called before the writing starts, it's a chance for the subclass to do some initialisation if it needs it.

setConverter

This is used to set the proper converter for character encodings. The default implementation should suffice but you can override it. It must set $self->{Encoder} to an Encoder object. Subclasses *should* call it.

setConsumer

Same as above, except that it is for the Consumer object, and that it must set $self->{Consumer}.

setEscaperRegex

Will initialise the escaping regex $self->{EscaperRegex} based on what is needed.

escape STRING

Takes a string and escapes it properly.

setCommentEscaperRegex and escapeComment STRING

These work exactly the same as the two above, except that they are meant to operate on comment contents, which often have different escaping rules than those that apply to regular content.

TODO

- proper UTF-16 handling

- the formatting options need to be developed.

- test, test, test (and then some tests)

- doc, doc, doc (actually this part is in better shape)

- remove the xml_decl and replace it with intelligent logic, as
discussed on perl-xml

- make a the Consumer selecting code available in the API, to avoid
duplicating

- add an Apache output Consumer, triggered by passing $r as Output

CREDITS

Michael Koehne (XML::Handler::YAWriter) for much inspiration and Barrie Slaymaker for the Consumer pattern idea, the coderef output option and miscellaneous bugfixes and performance tweaks. Of course the usual suspects (Kip Hampton and Matt Sergeant) helped in the usual ways.

SEE ALSO

XML::SAX::*

AUTHORS

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by Robin Berjon.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.