XMLAttrs Filter (script-based)

XML Attribute Processing

DocOrigin does not even consider attributes in the XML it is given as data. That doesn't seem to be a hardship to anyone producing the XML for DocOrigin consumption. The idea is always to keep it simple and ignoring attributes does help do that. However, sometimes one is faced with XML that is already being produced and that does have attributes. What to do then?

Well, one could drag out your XSLT knowledge and several Aspirin and go for it. Yeah, right!

DocOrigin supports the concept of filters. You can specify a -filter filtername option on the Merge command line and Merge will run the filter on the fly, converting your data stream in whatever way your filter does, before handing it off for form + data merging. Very handy. Those filters can be written in whatever language and supplied as an executable or written in DocOrigin JavaScript (.wjs) and executed that way.

You are not on your own, in fact it is my fond hope that folks will begin to share the filters that they write. Eclipse Corporation is supplying an XmlAttrs.wjs filter that works for us. It's a simple concept. As it filters the input XML file full of attributes, it outputs a new XML file in which every attribute has been turned into a child element of the element that had attributes. For example:

<Customer name="Acme Corp">
  …

becomes

<Customer>
  <name>Acme Corp</name>
  …

Hence, all of the attributes are now exposed to DocOrigin processing.

All you have to do is specify -filter $S/XmlAttrs.wjs as a command line option to Merge (probably buried, in the usual way, in the .prm file for the job). That's it. Now your form has access to those attributes. [$S is defined in Default-Paths.prm as install dir/Merge/Scripts.]

At the moment the script does not prettily indent the output XML. Want to update it and share?

Interracial marriage is great, but mixed content in XML is a dumb idea. This script will throw a hissy fit, I imagine, if you supply XML that is not well-formed, or XML that has mixed content -- that is where an element has both child elements and text content. Don't do that!

So what happens when you have:

<Customer sla="premium">Acme Corp</Customer>

Won't that create the abhorred 'mixed content'? As in:

<Customer>
  <sla>premium</sla>
  Acme Corp 				<!-- Aack, mixed content -->
</Customer>

That would be unacceptable (imo). We have to invent a tag to enclose Acme Corp. Choices abound. For better or worse (Leave a Comment) I have chosen to enclose it with the original tag, so…

<Customer>
  <sla>premium</sla>
  <Customer>Acme Corp</Customer>
</Customer>

And so, yes, you do end up with Customer.Customer. And your choice would be …?
Well, you have your choice. You can choose to supply a -contentTag=aTag command line option. For example, you might choose -contentTag=_content_. As another trick or treat we consider the ! character to be special. It is interpreted to mean 'whatever' the original element name was. So -contentTag=_! would mean _Customer in the examples used above. The default -contentTag is !.

BTW, the script recognizes and faithfully copies: XML comments, XML processing instructions, and CDATA.

Let one of your attributes be enjoyment (and another be content).