eXProc.org

Other Proposed Steps

This page collects proposed extension steps. Implementation welcome, but contents subject to change at any time.

These steps are in the “proposed extension namespace”, http://exproc.org/proposed/steps, identified by the prefix “pxp”.

pxp:nvdl

A step for performing NVDL (Namespace-based Validation Dispatching Language) validation over mixed-namespace documents.

<p:declare-step type="pxp:nvdl">
     <p:input port="source" primary="true"/>
     <p:input port="nvdl"/>
     <p:input port="schemas" sequence="true"/>
     <p:output port="result"/>
     <p:option name="assert-valid" select="'true'"/>               <!-- boolean -->
</p:declare-step>

The source document is validated using the namespace dispatching rules contained in the nvdl document.

The dispatching rules may contain URI references that point to the actual schemas to be used. As long as these schemas are accessible, it is not necessary to pass anything on the schemas port. However, if one or more schemas are provided on the schemas port, then these schemas should be used in validation.

This requirement is expressed only as a “should” and not a “must” because XProc version 1.0 does not mandate that implementations support caching of documents so that requests for a URI by one step can automatically access the result of some other step if that result had a base URI identical to the requested document.

However, it's not clear that the schemas port has any value if the implementation does not support this behavior.

The value of the assert-valid option must be a boolean. It is a dynamic error if the assert-valid option is true and the input document is not valid.

The output from this step is a copy of the input, possibly augmented by application by schema processing. The output of this step may include PSVI annotations.

pxp:unzip

A step for extracting information out of ZIP archives.

<p:declare-step type="pxp:unzip">
     <p:output port="result"/>
     <p:option name="href" required="true"/>                       <!-- anyURI -->
     <p:option name="file"/>                                       <!-- string -->
     <p:option name="content-type"/>                               <!-- string -->
</p:declare-step>

The value of the href option must be an IRI. It is a dynamic error if the document so identified does not exist or cannot be read.

The value of the file option, if specified, must be the fully qualified path-name of a document in the archive. It is dynamic error if the value specified does not identify a file in the archive.

The output from the pxp:unzip step must conform to the ziptoc.rnc schema.

If the file option is specified, the selected file in the archive is extracted and returned:

  • If the content-type is not specified, or if an XML content type is specified, the file is parsed as XML and returned. It is a dynamic error if the file is not well-formed XML.

  • If the content-type specified is not an XML content type, the file is base64 encoded and returned in a single c:data element.

If the file option is not specified, a table of contents for the archive is returned.

For example, the contents of the XML Calabash 0.8.5 distribution archive might be reported like this:

<c:zipfile xmlns:c="http://www.w3.org/ns/xproc-step"
           href="http://xmlcalabash.com/download/calabash-0.8.5.zip">
   <c:directory name="calabash-0.8.5/" date="2008-11-04T19:29:20.000-05:00"/>
   <c:directory name="calabash-0.8.5/docs/" date="2008-11-04T19:29:20.000-05:00"/>
   <c:file compressed-size="11942" size="36677" name="calabash-0.8.5/docs/CDDL+GPL.txt"
           date="2008-11-04T19:29:20.000-05:00"/>
   <c:file compressed-size="928" size="2110" name="calabash-0.8.5/docs/ChangeLog"
           date="2008-11-04T19:29:20.000-05:00"/>
   <c:file compressed-size="6817" size="17987" name="calabash-0.8.5/docs/GPL.txt"
           date="2008-11-04T19:29:20.000-05:00"/>
   <c:file compressed-size="494" size="830" name="calabash-0.8.5/docs/NOTICES"
           date="2008-11-04T19:29:20.000-05:00"/>
   <c:directory name="calabash-0.8.5/lib/" date="2008-11-04T19:29:20.000-05:00"/>
   <c:file compressed-size="389650" size="407421" name="calabash-0.8.5/lib/calabash.jar"
           date="2008-11-04T19:29:20.000-05:00"/>
   <c:file compressed-size="1237" size="2493" name="calabash-0.8.5/README"
           date="2008-11-04T19:29:20.000-05:00"/>
   <c:directory name="calabash-0.8.5/xpl/" date="2008-11-04T19:29:20.000-05:00"/>
   <c:file compressed-size="175" size="255" name="calabash-0.8.5/xpl/pipe.xpl"
           date="2008-11-04T19:29:20.000-05:00"/>
</c:zipfile>

pxp:zip

A step for creating ZIP archives.

<p:declare-step type="pxp:zip">
     <p:input port="source" sequence="true" primary="true"/>
     <p:input port="manifest"/>
     <p:output port="result"/>
     <p:option name="href" required="true"/>                       <!-- anyURI -->
     <p:option name="compression-method"/>                         <!-- "stored" | "deflated" -->
     <p:option name="compression-level"/>                          <!-- "smallest" | "fastest" | "default" | "huffman" | "none" -->
     <p:option name="command" select="'update'"/>                  <!-- "update" | "freshen" | "create" | "delete" -->
</p:declare-step>

The ZIP archive is identified by the href. The manifest (described below) provides the list of files to be processed in the archive. The command indicates the nature of the processing: “update”, “freshen”, “create”, or “delete”.

If files are added to the archive, compression-method indicates how they should be added: “stored” or “deflated”. For deflated files, the compression-level identifies the kind of compression: “smallest”, “fastest”, “default”, “huffman”, or “none”.

The entries identified by the manifest are processed. The manifest must conform to the following schema:

default namespace c="http://www.w3.org/ns/xproc-step"

start = zip-manifest

zip-manifest =
   element c:zip-manifest {
      entry*
   }

entry =
   element c:entry {
      attribute name { text }
    & attribute href { text }
    & attribute comment { text }?
    & attribute method { "deflated" | "stored" }
    & attribute level { "smallest" | "fastest" | "huffman" | "default" | "none" }
      empty
   }

For example:

<zip-manifest xmlns="http://www.w3.org/ns/xproc-step">
  <entry name="file1.xml" href="http://example.org/file1.xml" comment="An example file"/>
  <entry name="path/to/file2.xml" href="http://example.org/file2.xml" method="stored"/>
</zip-manifest>

If the command is “delete”, then file1.xml and path/to/file2.xml will be deleted from the archive. Otherwise, the file that appears on the source port that has the base URI http://example.org/file1.xml will be stored in the archive as file1.xml (using the default method and level), the file that appears on the source port that has the base URI http://example.org/file2.xml will be stored in the archive as path/to/file2.xml without being compressed.

A c:zipfile description of the archive content is produced on the result port.

pxp:gunzip

Important

Deprecated: See pxp:uncompress.

A step for expanding gzipped data.

<p:declare-step type="pxp:gunzip">
     <p:input port="source"/>
     <p:output port="result"/>
</p:declare-step>

If the document that appears on the source port is base64 encoded, this step will attempt to decode and gunzip the data. As a convenience, if the data is not encoded, it is simply passed through, like the p:identity step.

It is a dynamic error if the resulting, decoded and expanded data is not a well-formed XML document.

pxp:gzip

Important

Deprecated: See pxp:compress.

A step for storing gzip compressed data.

<p:declare-step type="pxp:gzip">
     <p:input port="source"/>
     <p:output port="result"/>
     <p:option name="href"/>                                       <!-- anyURI -->
     <p:option name="byte-order-mark"/>                            <!-- boolean -->
     <p:option name="cdata-section-elements" select="''"/>         <!-- ListOfQNames -->
     <p:option name="doctype-public"/>                             <!-- string -->
     <p:option name="doctype-system"/>                             <!-- anyURI -->
     <p:option name="encoding"/>                                   <!-- string -->
     <p:option name="escape-uri-attributes" select="'false'"/>     <!-- boolean -->
     <p:option name="include-content-type" select="'true'"/>       <!-- boolean -->
     <p:option name="indent" select="'false'"/>                    <!-- boolean -->
     <p:option name="media-type"/>                                 <!-- string -->
     <p:option name="method" select="'xml'"/>                      <!-- QName -->
     <p:option name="normalization-form" select="'none'"/>         <!-- NormalizationForm -->
     <p:option name="omit-xml-declaration" select="'true'"/>       <!-- boolean -->
     <p:option name="standalone" select="'omit'"/>                 <!-- "true" | "false" | "omit" -->
     <p:option name="undeclare-prefixes"/>                         <!-- boolean -->
     <p:option name="version" select="'1.0'"/>                     <!-- string -->
</p:declare-step>

The pxp:gzip step serializes the document that appears on its source port and compresses it with gzip. If the input document is base64 encoded, it is decoded and the corresponding bytes are compressed.

If the href attribute is present, the step attempts to store the compressed data to the IRI specified. In this case, it produces a c:result element on its result port that contains the IRI where the data was stored.

If the href attribute is not present, the step returns the compressed data in a base64 encoded c:data element with the content type “application/x-gzip”.

pxp:compress

A step for storing compressed data.

<p:declare-step type="pxp:compress">
     <p:input port="source"/>
     <p:output port="result"/>
     <p:option name="href"/>                                       <!-- anyURI -->
     <p:option name="compression-method"/>                         <!-- string -->
     <p:option name="byte-order-mark"/>                            <!-- boolean -->
     <p:option name="cdata-section-elements" select="''"/>         <!-- ListOfQNames -->
     <p:option name="doctype-public"/>                             <!-- string -->
     <p:option name="doctype-system"/>                             <!-- anyURI -->
     <p:option name="encoding"/>                                   <!-- string -->
     <p:option name="escape-uri-attributes" select="'false'"/>     <!-- boolean -->
     <p:option name="include-content-type" select="'true'"/>       <!-- boolean -->
     <p:option name="indent" select="'false'"/>                    <!-- boolean -->
     <p:option name="media-type"/>                                 <!-- string -->
     <p:option name="method" select="'xml'"/>                      <!-- QName -->
     <p:option name="normalization-form" select="'none'"/>         <!-- NormalizationForm -->
     <p:option name="omit-xml-declaration" select="'true'"/>       <!-- boolean -->
     <p:option name="standalone" select="'omit'"/>                 <!-- "true" | "false" | "omit" -->
     <p:option name="undeclare-prefixes"/>                         <!-- boolean -->
     <p:option name="version" select="'1.0'"/>                     <!-- string -->
</p:declare-step>

The pxp:compress step serializes the document that appears on its source port and compresses it. If the input document is base64 encoded, it is decoded and the corresponding bytes are compressed.

The compression-method option can be used to identify the compression method used. Suggested values are “bzip2”, “compress”, “gzip”, etc. If unspecified, the default method is implementation defined.

Note

Would it be better to specify a default? Perhaps gzip?

It is a dynamic error if the method is unrecognized.

If the href attribute is present, the step attempts to store the compressed data to the IRI specified. In this case, it produces a c:result element on its result port that contains the IRI where the data was stored.

If the href attribute is not present, the step returns the compressed data in a base64 encoded c:data element with an appropriate content-type.

pxp:uncompress

A step for expanding compressed data.

<p:declare-step type="pxp:uncompress">
     <p:input port="source"/>
     <p:output port="result"/>
     <p:option name="compression-method"/>                         <!-- string -->
</p:declare-step>

If the document that appears on the source port is base64 encoded, this step will decode and attempt to uncompress the data. As a convenience, if the data is not encoded, the XML document is simply passed through, like the p:identity step.

The compression-method option can be used to identify the compression method used. Suggested values are “bzip2”, “compress”, “gzip”, etc. If unspecified, implementations are free to attempt to deduce the method from the data.

It is a dynamic error if:

  • the compression method is unrecognized or

  • the resulting, decoded and expanded data is not a well-formed XML document.

pxp:set-base-uri

A step for changing the base URI of a document.

<p:declare-step type="pxp:set-base-uri">
     <p:input port="source"/>
     <p:output port="result"/>
     <p:option name="uri" required="true"/>                        <!-- string -->
</p:declare-step>

The document that appears on the source port is copied to the result port. The base URI of the copied document will be the URI specified in the uri option. If the URI specified is relative, it will be made absolute with respect to the base URI of the option element.