An XML document is, by construction, neatly divided into pieces organized as a tree of nodes: element nodes, text nodes, attribute nodes, namespace nodes. When you get a new XML document, the unknowns are names and contents of nodes, but you can trust the cuts, i.e. the tree. In other words, you see first the structure, and then the contents. These features have forged XSLT which is based on XPath expressions (to select nodes) and a hierarchical application of node processing templates.
An arbitrary text document is just the reverse: you see first a melted content, and then you have to discover the cuts to rebuild the structure. With XSLT, the tree is used to build the processing. With a text file, the processing is used to build the tree. That has forged the ReverseXSL Transformer.
Precisely, we have an inverted paralelism that we can summarize as a Tree for (driving the) Processing (XSLT) and Processing for (making a) Tree (ReverseXSL).
|XSLT is built over XPath expressions.||ReverseXSL (RXSL) is built over regular expressions.|
|XSLT adds the data processing and a control flow over XPath expressions which mostly tell what to take from the input XML document.||ReverseXSL adds the tree structure and organize regular expressions which do the processing, namely identify, cut, extract, and validate. In fact, the parsing of an input text file requires four different processing activities, noted (i), (c), (e), (v) further.|
|An XML document starts as a tree of nodes → XPath selects the nodes to process → XSLT organizes the sequence of selections and the processing that is applied → the output is simply the sequential concatenation of the outcomes of processing activities.||A text file starts as a big content → regular expressions identify the structures (i), or cut content into smaller pieces (c), or extract values (e), or validate data (v) → ReverseXSL provides the tree structure under which the four activities (i)(c)(e)(v) are organized → the output document matches the ReverseXSL tree. Being a tree, it is most naturally rendered in XML.|
|XSLT is part of the standard Java API for XML Processing (JAXP). Core classes are the TransformerFactory, and Transformer.||ReverseXSL software is supplied as an additional API library in java archive (jar) form. Core classes are the TransformerFactory, and Transformer.|
|XSLT maintains a context: XSLT apply processing templates in a hierarchy, maintaining a context by reference to the source document tree so that the same templates with relative XPath expressions are applied to sub-trees at different levels in the original tree.||ReverseXSL also maintains a context: ReverseXSL maintain a segmentation context, where the sub-pieces produced by a cut or extraction at some level become in turn the new (sub)message to cut further down till we reach the atoms of information.|
The Reverse XSL Transformer performs three activities in turn:
We must understand that, opposite to XSLT, we do not need to explicitly indicate which XSL template, and DEF file in our case, to apply to the input. Of course, one can also impose a priori a precise DEF and/or XSL to every input message entering a given transformation thread, but the ability to delegate the selection of transformations brings significant operational advantages: we can make a system capable of handling variant or new messages by just loading meta-data. No new channel, pipeline or dedicated process flow shall be created.
We must also note the option to associate none, one or both transformation steps (a ReverseXSL Parsing step, and an XSLT step) to each input message brand.
The reverseXSL parser requires two things to perform its job:
The DEF file contains meta data that describes the input message syntax. The structure of the file mimics that of the input message itself. At present, the structure is proprietary and easily edited with any text editor so that you can use your favorite development workstation to automate test runs, and develop your DEFinitions incrementally. Typically, you work on one chunk of the message at a time and focus on the relevant parsing details. You can immediately check the outcome, leaving the yet unparsed sections of the message as raw data.
The introduction to the documentation of Message DEFinition files explains further how the parsing proceeds.