XmlInput Class (Read XML Files)

(As of 3.1.001.01)

XmlInput is a JavaScript class that can be used to open and read any XML file. Is meant for simple scripting use, not for creating tree structures that have to be "recursed" to dig out what is in them.

The variable names oXI and o are just examples.
Multiple XmlInput objects can be alive and independently active at the same time.

Functions

`oXI=XmlInput(`fileName`)`	Create an `XmlInput` object.
`o=oXI.next()`	Get the next element from the XML file.
`o=oXI.next(`tag`)`	Get the next element with the provided tag name.
`o=oXI.next([`tag1`,` tag2`, ...])`	Get the next element with one of the provided tag names.
`o=oXI.get()`	Get the next leaf data tag.
`o=oXI.get(`tag`)`	Get the next leaf tag with the provided tag name.
`o=oXI.get([`tag1`,` tag2`, ...])`	Get the next end tag with one of the provided tag names.
`o=oXI.close()`	Close the file, free resources.

Create an XmlInput Object

You create an XmlInput object via:

var oXI = XmlInput(an xml file name);

If the XML file does not exist then a JS exception will be thrown. (You should check for that with a try-catch block). Otherwise, it will be a fine JavaScipt object.

The bookend to this is: oXI.close();

One Element at a Time

The simplest thing to do is to read the XML file sequentially and see what you get back. You can read the next element of the file with:

var o = oXI.next();

oXI.next() returns another JavaScript object with lots of useful properties. Which properties it has is dependent on what the next element of the XML file was. That could be a "Start Tag", an "End Tag", a piece of data, a comment, or a Processing Instruction (PI). If it comes back with null that means that you have hit end-of-file.

You might say, "Gee, what is this doing for me? Just simple sequential access, what good is that?". If you have ever tried to parse out an XML file in script you will (or should) appreciate how much nit-picky code is required to properly parse an XML file in all its many permitted syntax variations. The underlying C++ code uses the Xerces library to do that gnarly parsing. Because of that you don't even have to think about it. Just accept the elements that come through to you without any fuss required. < > & etc. even CDATA -- all handled for free.

The following is a really simple script that I urge you to use to understand what you get by using oXI.next().

try {
	var oXI = XmlInput("C:/temp/simple.xml");
	while (true) {
		var o = oXI.next();
		if (o == null) break;
		_printf("\n");
		for (var x in o)
			_printf("o has '%s' as '%s'\n", x, o[x]);
	}
	oXI.close();
} catch (e) {
	_printf("caught error '%s'\n", e);
	return 201;
}

The o object that is returned has some subset (as applicable) of the following properties:

Property	Meaning
`type`	The kind of element that was returned (see below for details).
`line`	The line number of the XML file that has this element.
`column`	The column in that line where the element ends.
`som`	The dotted schema object model notation for the returned element.
`startTag`	For start (aka open) tags, the tag name `<tag>`.
`endTag`	For end (aka close) tags, the tag name `</tag>`.
`leafData`	For a leaf node (a node with no children) the data it contains.
`attr`	For start tags that have attributes, this is an object that contains all the attributes and their values.
`comment`	The full comment text between `<!--` and `-->`.
`piName`	The 'target' of a Processing Instruction. E.g. in `<?DocOrigin ... ?>` it would be DocOrigin.
`piValue`	The full text after the target name in a Processing Instruction.

type, line, column, and som are always present. The others are present only when applicable.

The following types can occur:

Type	Meaning
1	This is a start tag; may include an '`attr`' object property.
2	This is leaf data.
4	This is an end tag.
8	This is a Processing Instruction -- includes both `piName`, and `piValue`.
16	This is a comment.
6	That is, 4 + 2, an end tag and the relevant leaf data.
7	That is, 4 + 2 + 1, a start tag, end tag, and leaf data. For `<tag/>` scenarios.

Note that 2 will never occur on its own.

You may or may not use that type property. Instead, you might use:

if (typeof o.leafData != "undefined") {
 	// Yay, I have real data for tag  o.endTag
 	...
}

It's pretty easy, and simple, to race through the XML file, picking off items of interest. If you choose to act on that data immediately or choose to store away the data in arrays, or objects of your own design, for later reference, well, that is up to you. You are in control.

Just the Leaf Nodes

It could be that you do not need all the structural tags, but rather just tags with data in them, i.e. leaf tags. You can skip over all the other stuff and get only the data elements by using (in a loop):

o = oXI.get();

Remember that not only do you get the start tag, end tag and data, you also get the full SOM expression. You do know what structure surrounds this data if it is of interest to you. You don't need to worry about typeof .. != "undefined" because startTag, leafData, and endTag will always be defined after an oXI.get().

Just `that` Tag

It's quite probable that you know the structure of the XML file that you are reading. Perhaps you simply want to get the runDate control element. You can do that by using:

o = oXI.get("runDate");

(or whatever leaf tag name interests you). We only ever use leaf tag names, not dotted SOM expressions. The full dotted SOM expression is provided back to you, but must not be provided to the oXI.get() function. If you misspell the tag name, chances are that it will race along all the way to end-of-file and you will get null back.

CAUTION: The reading of the XML file proceeds in only one direction: forward. If you wish to access several specific tags, do so in the order in which they appear in the file.

Get 'One of These' Tags

Maybe you are not absolutely certain of the tag order or simply don't want to bother to do that research. You can specify an array of tag names to oXi.get() and it will return with whichever one it finds first. It does not return an array of objects but rather returns the first one that it finds that matches your list.

Imagine an XML file that contains a snippet like this:

<Detail>
	<Item>23</Item>
	<Desc>A fine item, you just have to have</Desc>
	<Price>19.99</Price>
	<Qty>4</Qty>
</Detail>

You might choose to use:

o = oXI.get(["Item", "Qty", "Price", "Desc", "Detail"]);

It will return with whichever one it finds first (next?). Strangely, I added the "Detail" tag name to my list. It's not even a leaf tag! Why did I do that? Well, I didn't want to run off into the next Detail structure so I put a backstop on my get request. In my script, I would check to see if the end tag "Detail" was returned. If it was, I would know that I had exhausted that Detail structure. Because of that desire, this form of the oXI.get() function returns both leaf nodes and end tags (assuming they are in your interest list).

// Process a Detail structure
var tagsOfInterest = ["Item", "Qty", "Price", "Desc", "Detail"];
var detail = {};                        // My detail object
while (true) {
	o = oXI.get(tagsOfInterest);
	if (o == null) break;               // Ouch! missing </Detail>!!
	if (o.endTag == "Detail") break;    // Finished this Detail structure
	detail[o.endTag] = o.leafData;
}
// Great my detail object is all filled out with .Item, .Qty, .Price, and .Desc

Zip to the Details Section

Reminder, you can just read the whole file sequentially. Perhaps you want to quickly skip over a bunch of header matter and get to the Details section. In that case, you are after a start tag of Details. You can do that with:

o = oXI.next("Details");

oXI.next() has the same variations as oXI.get(). That is, you might not supply a parameter and it would return the next tag that it finds. Or you might supply an array of tags that interest you and it will return the next one of those that it finds. In practice, you would probably use the variations of oXI.next() augmented by only the no-arguments version of oXI.get(). The latter to race through leaf data nodes, and the former to zip along to any named tag. oXI.get(parms) provides no advantages over oXI.next(parms).

Attributes

Some people insist on putting attributes in their XML files. If they do, you will get (no choice in the matter) an attr property on the start tag element which has those attributes. Let's consider an example where your XML has the following:

<foo bar="high" length="long"> ...

When your use of oXI returns that start tag element it will have an o.attr property. In this example you could reference:

o.attr.bar and o.attr.length.

Or in the unlikely case of your not knowing what attributes to expect, you could use the usual:

for (var prop in o.attr) {
	// do something with prop and o.attr[prop]
}

Where Am I?

It's possible that you discover a data value that seems in error. You may wish to log a message about that. It's helpful to report the file name and the location of where you detected the issue. To that end you always have available:

o.line and o.column.

You can also use oXI.getFileName() just in case you forgot which file you opened.

Repositioning

While blocking your mind of any thought of performance you can always call:

o = oXI.reposition(line, column)

If you leave column out it is deemed to be 0. If you leave both line and column out they are both deemed 0, and you effectively are asking to "rewind". If necessary (and it likely will be) the XML parsing operation will start back at the beginning of the file and race along, behind the scenes, until it gets a line and column that is equal to, or the first one greater than, the values that you supplied. It returns the element at that position. Subsequent oXI calls will carry on from there. It may be that you remembered some previously returned line and column values and want to return to the scene of the crime.

Document at a Time Access

For programmers and academics, there is a natural desire to skip all this sequential stuff and load a DOM. Then allow you to (saddle you with) traversing a tree recursively to dig out information of interest. Well, hey, if you wish to create such constructs, go for it. Now that all the gnarly syntactical parsing (which is not very rewarding to the soul) is taken care of, bless you Xerces, you can easily create the ivory tower constructs of your dreams. I have a hunch that you will be more efficient in your forms development using the, essentially sequential, access methods supplied above. And I feel certain that you will have a far greater handle on what you code. (BTW: Do employ functions.)

We could still add document-at-a-time access someday should feedback rooted in practicality be provided that leads that way.