Skip to main content
Last updated May 05, 2012 17:57, by berhauz

How is the ReverseXSL parser working?

An XML document is, by construction, neatly divided into pieces organized as a tree of nodes: element nodes, text nodes, attribute nodes, namespace nodes. When you get a new XML document, the unknowns are names and contents of nodes, but you can trust the cuts, i.e. the tree. In other words, you see first the structure, and then the contents. These features have forged XSLT which is based on XPath expressions (to select nodes) and a hierarchical application of node processing templates.

An arbitrary text document is just the reverse: you see first a melted content, and then you have to discover the cuts to rebuild the structure. With XSLT, the tree is used to build the processing. With a text file, the processing is used to build the tree. That has forged the ReverseXSL Transformer.

There is an amazing parallelism between XSLT and ReverseXSL

Precisely, we have an inverted paralelism that we can summarize as a Tree for (driving the) Processing (XSLT) and Processing for (making a) Tree (ReverseXSL).

XSLT Reverse XSL
XSLT is built over XPath expressions. ReverseXSL (RXSL) is built over regular expressions.
XSLT adds the data processing and a control flow over XPath expressions which mostly tell what to take from the input XML document. ReverseXSL adds the tree structure and organize regular expressions which do the processing, namely identify, cut, extract, and validate. In fact, the parsing of an input text file requires four different processing activities, noted (i), (c), (e), (v) further.
An XML document starts as a tree of nodes → XPath selects the nodes to process → XSLT organizes the sequence of selections and the processing that is applied → the output is simply the sequential concatenation of the outcomes of processing activities. A text file starts as a big content → regular expressions identify the structures (i), or cut content into smaller pieces (c), or extract values (e), or validate data (v) → ReverseXSL provides the tree structure under which the four activities (i)(c)(e)(v) are organized → the output document matches the ReverseXSL tree. Being a tree, it is most naturally rendered in XML.
XSLT is part of the standard Java API for XML Processing (JAXP). Core classes are the TransformerFactory, and Transformer. ReverseXSL software is supplied as an additional API library in java archive (jar) form. Core classes are the TransformerFactory, and Transformer.
XSLT maintains a context: XSLT apply processing templates in a hierarchy, maintaining a context by reference to the source document tree so that the same templates with relative XPath expressions are applied to sub-trees at different levels in the original tree. ReverseXSL also maintains a context: ReverseXSL maintain a segmentation context, where the sub-pieces produced by a cut or extraction at some level become in turn the new (sub)message to cut further down till we reach the atoms of information.

How does the ReverseXSL Transformer proceed?

The Reverse XSL Transformer performs three activities in turn:

  1. Identify the brand of message to process, and dynamically associate none, one or both of the next two steps. To do so, the ReverseXSL transformer loads message identification patterns and associated parameters from a mapping selection table in a simple text file.
  2. Parsing Step (in case the input message is non-XML): Parse the input message and transform it to an XML document. To do so, the ReverseXSL parser loads instructions from a so-called DEF file.
  3. XSLT Step (typically if we need to re-order XML elements): Invoke an XSL transformation. To do so, XSLT loads instructions from an XSL file.

We must understand that, opposite to XSLT, we do not need to explicitly indicate which XSL template, and DEF file in our case, to apply to the input. Of course, one can also impose a priori a precise DEF and/or XSL to every input message entering a given transformation thread, but the ability to delegate the selection of transformations brings significant operational advantages: we can make a system capable of handling variant or new messages by just loading meta-data. No new channel, pipeline or dedicated process flow shall be created.

We must also note the option to associate none, one or both transformation steps (a ReverseXSL Parsing step, and an XSLT step) to each input message brand.

  • None clearly means pass-through. In other words, the input at stake (be it XML or not!) is good enough for direct processing by the target application.
  • One can consist of the parsing step alone. A non-XML data message becomes an XML document.
  • One can also mean XSLT alone. An XML document must be adjusted to comply with a schema required by the target application
  • Two combines parsing with an XSL post-transformation, therefore extending the XML transformation capabilities of the ReverseXSL parser with the rich data-types and element processing functions in XSLT. As explained elsewhere, the ReverseXSL parser does not re-order elements from the input data message. This job is quite easily achieved with XSLT.

The reverseXSL parser recursively identifies, cuts, extracts, and validates smaller and smaller segments of the original data

The reverseXSL parser requires two things to perform its job:

  • the input data message, of course
  • a DEF file, standing for the message DEFinition

The DEF file contains meta data that describes the input message syntax. The structure of the file mimics that of the input message itself. At present, the structure is proprietary and easily edited with any text editor so that you can use your favorite development workstation to automate test runs, and develop your DEFinitions incrementally. Typically, you work on one chunk of the message at a time and focus on the relevant parsing details. You can immediately check the outcome, leaving the yet unparsed sections of the message as raw data.

The introduction to the documentation of Message DEFinition files explains further how the parsing proceeds.

Please Confirm