|
Table of Contents Abstract EnMasse is a server-based product for shared document formatting. It accepts documents over the network and formats them by distributing the formatting tasks among multiple computers. Regular users publish customized documents in high volumes and varied formats. Server access is via straightforward controls. The users may customize server operation for their work-flow. Competing programs on the same computer fight for resources, and take more time to return a result. More memory and newer processors help applications run faster, but there are often memory limits. Processors become faster, but their speed is limited; multiple-processor configurations are expensive, and performance does not rise in line with the price and the number of processors. A solution is to distribute processing among multiple computers. If you have as many computers as there are programs, each program runs as fast as the computer allows it and uses as much memory as you can install. Using multiple computers, programs perform as well as the current level of technology permits, and can achieve a data throughput as high as you can afford; and the price for stock single-CPU PCs is low enough to have sufficient resources to meet your business needs. If these are connected so that they work all the time, the load is evenly distributed, and even when nodes fail, or need upgrading and are removed, the system still processes all your requests, preserves your data, and the only change noticed is a slight decrease in performance. This is a well-known but not an easy task; and to really use the joint power of grid computing as it is known, you must resolve this issue. XEP creates elegantly formatted print-ready views of documents. In many deployments, it produces hundreds of thousands of documents on demand. For example:
These are just a few examples. However fast and optimized the program's code, business needs a way to scale up performance so that it handles a growing load. EnMasse solves this problem for you. For a user or an application programmer, it is a single access point. Through a shared folder, a web form or a network connection (locally on your network, or across the Internet), EnMasse accepts document processing requests and sends them to the XEP formatting engines running on several computers in the local network, then delivers formatted documents back to the user. It monitors the formatting engine performance, notices when they go down and are restored and readjusts the distribution of requests according to the workload. When a server fails in the midst of processing a request, EnMasse re-submits the request to a different server; thus the only impact is a slightly increased response time for that particular document. This provides the security of service needed in today's world. EnMasse is both opaque and transparent. On one hand, it provides you with single point abstraction such that there is no need to worry about the number of engines running, or about their load; EnMasse dispatches requests to the most appropriate node. On the other hand, both the access point and the processing engines have standard, embedded servers. The system administrator can instantly check the status of the grid, identify problems and take appropriate action. An EnMasse access point takes little memory overhead and processing time, it can be deployed on a loaded intranet server, or even on a workstation and as long as the processing engines run on separate machines, it does not affect the speed or throughput of the grid. For accounting purposes and performance tuning, EnMasse provides a logging facility. One can adjust the extent of the logging, or completely switch it off. The log files are easy to understand by humans and to process by programs. EnMasse runs on a wide range of hardware and operating systems, easy to install and requires little maintenance. It has run for weeks on a mix of Linux, FreeBSD and Windows nodes wholly unattended, with multiple access points over the same grid when needed. It has proved to be the solution to many performance problems, providing a reliable service without high levels of support. Internally, EnMasse distributes formatting jobs, logs activity and monitors grid performance. Whatever the system around it is doing, the role of its core remains constant. For the user, EnMasse provides a choice of ways to submit tasks and receive responses. The three current interfaces are the active folder (Actinia), network server (Toaster) and the SOAP server (Fairy). Actinia is revealed to the user as an active folder. When the user drops an XSL-FO file into the folder, Actinia notices it, picks it up and sends it for formatting to one of servers in the grid, then stores the formatted document in the output folder. The output folder can be the same as, or different from, the input one. This approach works when the user sends the document for processing, for example, when a different player needs the formatted document, or when the document leaves the system in another medium (for example is printed and delivered in hard-copy form). A typical usage is a bank generating statements, bills, invoices, personalized mails etc. Different programs installed on many servers generate different kinds of documents, each with its own styling and each with its own data retrieved from the database. The documents are generated as XML, styled using application specific transforms into XSL-FO, then all documents are placed into the inbound folder of EnMasse Actinia. Actinia picks them up and places generated postscript files into the output folder. A separate program monitors the output folder and sends the final documents to a number of print devices according to labels embedded into the documents. The service to the Bank is that of a dedicated print room! Toaster monitors a network connection, accepts source (XSL-FO) styled documents and sends back formatted documents via the same connection to the user. Unlike the Actinia case, the client always receives the result of processing in an electronic form for local print generation. This is suitable when the user requesting document processing and is both the producer of XML sources and the consumer of their formatted output. A university server provides a formatting facility for student projects. Students submit documents marked up in DocBook XML via a web interface and get them back as printable PDF. The web server connects to the EnMasse server via the intranet, sends the source and receives the formatted document, and then forwards it to the student browser. This saves a huge amount of time with each student configuring and learning about DocBook processing locally. Fairy is a SOAP server which accepts source (XSL-FO) styled documents and sends back formatted documents via the same connection to the user. It can be easily tied with any application which supports SOAP, because writing SOAP clients is an easy task. A stylesheet to convert Microsoft Excel's XML output into XSL is stored on the HTTP server. Users compose their Microsoft Excel spreadsheets, press the button "Xls2Fo" in a toolbar, and a VBA program converts their spreadsheet to XML, adds to it processing instruction specifying XSL stylesheet and, with help of Microsoft Office Web Services Toolkit, sends it for formatting to Fairy SOAP web service. EnMasse, in Actinia, Toaster and Fairy configurations, can apply XSL transformation to input documents. EnMasse nodes recognize For example, an installation dedicated to the formatting of DocBook documents may provide access to DocBook XSL stylesheets stored on a local server; the nodes will load and apply the stylesheets, and then format the generated XSL FO (XSLFO) into PDF or PostScript. The EnMasse distribution contains: The text files use Unix-style line separator. All XML files are encoded using UTF-8. Documentation is generated from DocBook XML source using DocBook XSL stylesheets. All documentation is prepared using RenderX XEP. Some filenames in the distribution are in mixed case, sometimes called CamelCase, e.g. ThisIsMixedCase.xml. Please take care to use suitable unpacking or unzipping tools, such as InfoZIP or Winzip which correctly handle mixed case filenames. Depending on the flavor of your operating system, your tools, and security considerations, you may want to change access permissions and ownership of the files in the distribution, but it is important to retain the case sensitive filenames EnMasse runs on any operating system that has TCP sockets, a Java Virtual Machine, and Python 2.2 or newer ( the current stable version at the time of writing is 2.3.4). Since you use XEP, you have Java. I recommend that you read Installing Python from the book Dive Into Python by Mark Pilgrim if you don't yet have Python on your computer. EnMasse installation has three types of installed content: Programs and configuration files; you will seldom need to change them, and EnMasse never writes to these locations. In the
distribution, these files are in On Unix, a natural place for these files would be under mkdir /usr/local/EnMasse tar cf - bin lib etc doc|(cd /usr/local/EnMasse; tar xf -) Windows users might use From now on the installed directory will be referred to as ${instDir} Working directories. EnMasse needs one folder as an internal working directory; additionally, Actinia requires three folders for user files: On Unix, these folders are in Program logs. EnMasse writes detailed logs, to detect and resolve problems and tune performance. On Unix, A bash script To run EnMasse, launch an access point on one of the servers and several XEP engines, usually on separate computers. ${instDir}/bin/enmasse is a shell script that launches the access point; it issues the following command:
python ${instDir}/lib/Python/enmasse.py etc/enmasse.conf
where ${instDir}/bin/engine launches an XEP engine; it is a call (to a Java program):
java com.renderx.xepx.cliser.Engine -DCONFIG=/path/to/xep.xml
(replace the path to
Both the access point and the engines run embedded HTTP servers; the servers display the current state, to monitor activity and help performance tuning. By default, the HTTP ports are 6590 for Actinia, 6595 for Toaster, 6597 for Fairy, and 6580 for XEP. To run EnMasse, you must configure it. A configuration file determines both how EnMasse interacts with the outside world and how it manages XEP engines and distributes the load. While for many parameters the default values are satisfactory, some values are required to be set explicitly to describe your local environment (the network and the computer). The configuration file is in the following XML format (in Relax NG).
config = actinia | toaster | fairy
actinia = element actinia {
actinia-folders & settings
}
toaster = element toaster {
toaster-folders & settings
}
fairy = element fairy {
fairy-folders & settings
}
settings = options & servers & cliser
actinia-folders = element folders {
attribute input {string},
attribute output {string},
attribute quarantine {string},
attribute temporary {string}
}
toaster-folders = element folders {
attribute temporary {string}
}
fairy-folders = element folders {
attribute temporary {string}
}
options = element option {
attribute name {token},
attribute value {string}
}*
servers = servers {
element server {
attribute host {token}?,
attribute port {token}?
}+
}
cliser = element cliser {
attribute format {token}?,
options
}
The EnMasse mode is set by the top-level element; it is Attributes of
CLISER, RenderX XEP Client-Server protocol, is the underlying protocol layer; element <cliser format="pdf"> <option name="FRM:VALIDATE" value="'true'"/> <option name="GEN:pdf:COMPRESS" value="'false'"/> </cliser> sets output format to PDF, enables validation and switches off compression. You can tune EnMasse' performance through a number of options. Default values are fine for most applications. By changing them you can build the exact configuration you want and fine-tune the load on the grid,, the throughput, and the response time. Here is the list of all the available options, with their data types and default values in parentheses.
Toaster is one of the parts of EnMasse which requires that you write a program to use it. Since Toaster accepts requests over
a network TCP socket, and implements a simple protocol, you must implement the protocol in your language of choice and embed
it in your client-side application, such as a web form, or an authoring tool. An example of protocol CGI script calling toaster to format a document submitted via a WWW page is provided in The protocol involves one request and one response. The client sends the request, in the form RECEIVE followed by a zero byte ('\0' in C), and then by the data of RECEIVE followed by a zero byte and by the formatted document. If EnMasse cannot format the document, it sends
ERROR followed by a zero byte and then by the error message. The message contains XEP's diagnostics and helps identify the problem. Fairy also requires you to write a program to use it. Since Fairy accepts SOAP requests you can use any SOAP toolkit to access it.
Fairy, as SOAP service, provides two methods: to format and to stop. If not otherwise specified in configuration methods names
are respectively
Fairy provides WSDL document, describing the service. For example, if Fairy is running at host Fairy will accept raw data (recommended with replaced metasymbols), but for greater compatibility two more options:
Here are examples of SOAP requests:
...
<format>
<systemId>SYSTEMID</systemId>
<xml>XML_DATA_METASYMBOLS_REPLACED</xml>
</format>
...
...
<format>
<systemId xsi:type="xsd:base64">SYSTEMID_BASE64_ENCODED</systemId>
<xml xsi:type="xsd:base64">XML_DATA_BASE64_ENCODED</xml>
</format>
...
A
C
F
J
LP
T
X
[1] "users" in the Unix sense, that is, owners of processes; users do not have to access these folders directly. |
|||||
RenderX®, © 2005-2008 • Contact Us • Privacy Policy • Terms of Service • Site design by Dmitry Kirsanov Studio |
|||||