from Patrick Gundlach |

New XPath variables semantics

Categories: Development

Something that was a bit inconsistent for a while is variable assignments of element structures.

Take this snippet for example:

<SetVariable variable="myvar">
	<Element name="Foo">
		<Attribute name="attr" select="'foo1'"/>
	</Element>
	<Element name="Bar">
		<Attribute name="attr" select="'bar1'"/>
	</Element>
</SetVariable>

Now, what exactly is the contents of $myvar? With the old XPath parser, this used to be a table with two entries (two elements). But inconsistent: Both count($myvar) and count($myvar/Foo) return 2, which is incorrect in both cases.

From now on (the new XPath parser) tries to stay as close as possible to the XPath / XSLT standards. count($myvar) now returns 1 (the XML fragment) and count($myvar/Foo) returns 1 (the element node Foo), and count($myvar/*) returns 2 (all element nodes in the fragment $myvar).

This would be the equivalent in XSLT:

<xsl:variable name="myvar">
  <Foo attr="foo2" />
  <Bar attr="bar1" />
</xsl:variable>

<xsl:message select="count($myvar)" />
<xsl:message select="count($myvar/Foo)" />
<xsl:message select="count($myvar/*)" />

which prints out the messages 1, 1 and 2, as expected.

This looks a bit hair-splitting, but has quite a few consequences, which have incompatible backwards behavior.

Breaking backwards compatibility

Breaking backwards compatibility is something I really try to avoid, but in this case I cannot see a way around it. The good news is that you can always switch to the old behavior by setting xpath=luxor in the configuration file. And for new projects, you will have a much more sensible setup.

Show me the pitfalls

Every time you access a variable that has a data structure (constructed with <Element> and <Attribute> commands), you need to have an explicit node selector.

All commands that use the attributes select and test are affected. See the index example in the GitHub repository. It now has

<Makeindex select="$indexentries/indexentry"  ...

instead of

<Makeindex select="$indexentries"  ...

Explanation: the variable $indexentries is constructed as follows (in a loop):

<SetVariable variable="indexentries">
	<Copy-of select="$indexentries" />
	<Element name="indexentry">
		<Attribute name="name" select="@word" />
		<Attribute name="page" select="@page" />
	</Element>
</SetVariable>

so it is an XML fragment containing several nodes of type element. To get all of these elements, an explicit selection of $indexentries/indexentry is necessary.

Also <ProcessNode select="$index" /> has been changed to <ProcessNode select="$index/index" /> because the former just selects the (unnamed) XML fragment and the latter selects a sequence of elements which gets processed individually.

A bit surprisingly comes <Copy-of>, although it is not a special case.

<SetVariable variable="myvar">
	<Element name="Foo">
		<Attribute name="attr" select="'foo1'"/>
	</Element>
	<Element name="Bar">
		<Attribute name="attr" select="'bar1'"/>
	</Element>
</SetVariable>

<SetVariable variable="myvar2">
	<Copy-of select="$myvar" />
	<Element name="Foo">
		<Attribute name="attr" select="'foo2'"/>
	</Element>
</SetVariable>

The variable $myvar2 now contains a copy of $myvar and an additional element Foo. You could also explicitly select the desired nodes:

<SetVariable variable="myvar2">
	<Copy-of select="$myvar/Foo" />
	<Element name="Foo">
		<Attribute name="attr" select="'foo2'"/>
	</Element>
</SetVariable>

Both ways ($myvar and $myvar/Foo) work. The first expression, which selects the whole fragment, works by copying the contents of the fragment to the result sequence, and the second expression explicitly selects the two elements which are copied to the result sequence.

Summary

If you have an old layout, you can keep using it by selecting the old XPath parser. If you use the new XPath parser, make sure you update your code if you use the <Element> and <Attribute> commands.