# Introduction

Welcome to the Frontier® SDK User's Guide & Tutorial. This document describes the Parabon® Frontier Software Development Kit (SDK), a collection of software tools and libraries that enables you to create distributed, compute-intensive applications that run on the Frontier grid platform.

The tutorial is designed to:

• Provide instructions on the installation and setup of the Frontier SDK.
• Review the basics of developing applications that run on the Frontier Internet computing platform.
• Illustrate useful techniques and "best practices" that you can use when you develop your own Frontier applications.

Before starting the tutorial you should first read the white paper The Frontier Platform Application Programming Interface , which explains many of the concepts the tutorial explores.

Frontier is the first commercial grid computing platform that aggregates the unused power of computers, connected exclusively within an enterprise or across the Internet, transforming idle computing resources into a general-purpose, high-performance computing environment. Frontier does this using three main components: the client application, the Frontier® server, and the Frontier® Compute Engine.

A Frontier client application starts by creating a job and sending smaller units of work called tasks within the job to the Frontier server. The server then forwards these tasks to providers-computer users who have the Frontier Compute Engine installed-for execution using their computers' spare power. While a task runs on a provider node, progress information and results are reported back to the Frontier server. A Frontier application can run a listener which queries the server for any results that have been collected, and listens for new results from the Frontier server. The listener can be stopped and restarted at will, each time obtaining the latest results from the Frontier server. When the job is complete and all results have been received the application removes the job from the server.

## Frontier SDK Directory Structure

The following is a brief overview of the Frontier SDK directory hierarchy.

Subdirectory Contents
bin SDK tools and scripts
conf Configuration files for the SDK tools and the client trusted certificate authorities (i.e. client.truststore)
demo Sample applications
doc Contains this document, the Frontier API white paper, and JavaDocs for the Frontier API
lib Frontier SDK libraries
simulator Directory where the Frontier Grid Simulator stores temporary files. This directory is automatically created when a demo application is launched in "simulator" mode.

This User's Guide & Tutorial is divided into the following sections:

Section Description
Introduction Provides an introduction to the Frontier SDK and Parabon's Frontier grid computing platform, and document and Technical Support contact information.
Setup Provides procedures for setting up the Frontier SDK for Remote Mode execution. It also includes instructions for building and running the sample applications included in the Frontier SDK package.
Tutorial Contains lessons that provide a step-by-step review of the fundamentals of developing a Frontier application.
Examples Complete sample applications that illustrate advanced Frontier programming techniques. These are a useful reference when developing your own Frontier applications. They are presented in a format similar to the lessons in the tutorial.
Appendix A: Using Client-Scope and Global Elements Provides instructions re-using task elements across multiple jobs.
Appendix B: Glossary Contains terms specific to the Frontier computing environment.

## Typographical Conventions

The following typographical conventions are used in this document:

Typeface Description Example
FixedWidth Application and task code, program output, file, and class names. To create a task we must first create and populate a TaskSpec object.
Bold Exact commands and characters typed by the user. The command

prime -listen

launches the "prime" sample application in "monitor" mode.
Italic
BoldItalic
Indicates a placeholder that should be replaced by the actual value. To build the sample, enter the following commands, replacing app_name with the name of the lesson's sample application (e.g., "local" for Lesson 1):
1. cd frontier-sdk-home/demo/app_name/src
2. ant

## Related Documents

The Frontier Platform Application Programming Interface

## Technical Support

For technical support, troubleshooting, and other questions concerning Frontier application development, please feel free to email or call the technical contacts provided by Parabon or fill out and submit our email form at http://www.parabon.com/MyFrontier/support.jsp.

# Setup

The Frontier SDK can run an application in one of three modes: local, simulator, or remote. This section describes how to configure an application to run in each mode.

## Local Mode

Local mode runs an application's tasks on the local machine and is used primarily for program testing and debugging. By default, all applications run in local mode unless explicitly configured to run in one of the other two modes.

## Simulator Mode

When running an application in simulator mode, the Frontier SDK will automatically launch the Frontier Grid Simulator thread to process the application's tasks. The simulator thread will emulate a small grid consisting of 3 compute engines, and will display the following window to show the application's progress. This mode is also used for program testing and debugging, but also provides the programmer a means to test the remote components in a controlled environment prior to submitting the job to the real Frontier Grid.

### Frontier Grid Simulator Configuration

The following system properties can be set to control the Frontier Grid Simulator. Default values are defined in the frontier-sdk-home/conf/frontier.properties file.

Property Description
com.parabon.gridSimulator.displayGui If set to true, the Frontier Grid Simulator window will appear when an application is launched in simulator mode. If false, the Frontier Grid Simulator will run quietly in the background. Default true.
com.parabon.gridSimulator.maxJavaHeap The maximum Java Virtual Machine (JVM) heap size per task when running an application in simulator mode. Format is equivalent to Java's "-Xmx" parameter. Default 256m.

Examples:

 128m => 128 Megabytes 128k => 128 Kilobytes 1g => 1 Gigabyte

## Remote Mode

In remote mode, the application is launched from the local machine, and tasks are sent to the Frontier server. The Frontier server sends the tasks to provider machines to execute the tasks. When the provider machines complete the tasks, the results are sent back through the server to the application. Remote mode lets your application take full advantage of the power of the Frontier grid computing platform.

### Client Registration

Before running an application in remote mode, you must first register for an account with the Frontier Grid Server. This will provide you with the necessary credentials to connect to, and exchange data securely with the server. If you haven't already done so, visit the Frontier Grid Server web site at http://www.parabon.com/MyFrontier/ to register for an account.

## Runtime Configuration

The parameters the Frontier SDK uses to connect to the Frontier Grid Server are specified by a group of Java system properties. These properties are configured by the Frontier SDK installation, but may be overridden by your application by either defining the properties on the java command line, or coding your application to set the properties before it connects to Frontier.

To set system properties on the java command line, do:

        java -Dproperty=value -Dproperty=value... application-main-class


To set system properties from within your application use the System.setProperty() method as follows:

        System.setProperty(property,value);


These methods can be combined; for example, your application could set default values for properties not specified on the command line. Note that if your Frontier application is a web application running in a J2EE application server, the J2EE server may restrict the system properties your application can modify.

The frontier-sdk-home/conf/frontier.properties file sets the default values for most of the system properties used by the Frontier SDK. This property file is loaded by the SDK Session Manager at initialization time.

 Note: All properties set via the java command line take precedence over properties set in the frontier-sdk-home/conf/frontier.properties file.

The system properties used by the Frontier SDK and their typical values are shown below.

Property Description
mode A value of remote runs the application in remote mode; any other value runs in local mode. Default local.
javax.net.ssl.trustStore SSL certificate truststore. The Frontier SDK uses SSL for all communication between the application and the Frontier server.

Normally:
frontier-sdk-home/conf/certs/client.truststore

Note: This property should not be modified.
com.parabon.frontier.user.username
com.parabon.frontier.user.password
Your user name and password registered with the Frontier server (i.e. Your Frontier account credentials -- user name is typically your email address). If any of these properties are not set (default), you will be prompted for them when you run a Frontier application in remote mode.
com.parabon.io.ssl.proxyHost
com.parabon.io.ssl.proxyPort
Set to the host name and port number of your proxy server. Only required if you use a proxy to connect to the Internet.
com.parabon.io.ssl.proxyAuthRequired
com.parabon.io.ssl.proxyAuthUser
com.parabon.io.ssl.proxyAuthPassword
When proxyAuthRequired is set to true, the Frontier SDK will authenticate to the proxy server using the username and password specified by the proxyAuthUser and proxyAuthPassword properties. Default value for proxyAuthRequired property is false .

# Sample Applications

Included with the Frontier SDK are several sample programs that demonstrate how to write a Frontier application. The samples include the files needed to run the application, complete source code, and an Ant build script for each sample. The sample files are located in the demo subdirectory of the Frontier SDK installation.

## Building the Samples

Each sample application provides an Ant build.xml file to build the application. Ant is a popular, platform-independent Java build tool that can be run from a command line or integrated into most popular Java IDEs. UNIX make and Microsoft® Windows® nmake makefiles, and a Windows batch file, are also provided for systems where Ant is unavailable.

To build a sample application, first set the JAVA_HOME and ANT_HOME environment variables to point to the Java JDK and Ant installation directories, and add the JDK and Ant executables to the system path:

• UNIX or Linux (Bourne/Bash/ksh shells):

JAVA_HOME=path-to-JDK
ANT_HOME=path-to-Ant-install
PATH=ANT_HOME/bin:$PATH export JAVA_HOME ANT_HOME PATH • UNIX or Linux (csh/tcsh shells): setenv JAVA_HOME path-to-JDK setenv ANT_HOME path-to-Ant-install set path=( ANT_HOME/bin$path)

• Microsoft® Windows®:

set JAVA_HOME=path-to-JDK
set ANT_HOME=path-to-Ant-install
path %JAVA_HOME%\bin;%ANT_HOME%\bin;%path%

For Windows these setting can be made persistent using the Control Panel. Select:

and set these environment variables in the User Variables pane.

To build a sample, enter the following commands replacing app_name with the name of the sample application directory (e.g., "local" for the sample used in Lesson 1) and frontier-sdk-home with the directory the Frontier SDK was installed in:

• UNIX or Linux:

cd frontier-sdk-home/demo/app_name/src
ant

• Microsoft® Windows®:

cd frontier-sdk-home\demo\app_name\src
ant

If Ant is unavailable, build the sample using the supplied batch file:

cd frontier-sdk-home\demo\app_name\src
buildwin32.bat

Two jar files are created by the build: one with the application code (app_nameApp.jar) and one that contains the task code (app_nameTask.jar).

You can use the Ant build files as examples when developing your own Frontier applications.

## Running the Samples

Each sample includes a UNIX shell script and Windows batch file to run the application. Run the samples with these commands:

• UNIX or Linux:

cd frontier-sdk-home/demo/app_name
./app_name.sh

• Microsoft® Windows®:

cd frontier-sdk-home\demo\app_name
app_name.bat

The examples will be explained in detail as we proceed through the tutorial on how to write a Frontier application.

# Tutorial Overview

This tutorial teaches the fundamentals of using the Frontier SDK to develop applications for the Frontier platform. The tutorial consists of a series of lessons and corresponding example programs, with each lesson illustrating progressively more complex programming techniques using the Frontier API.

The tutorial expects you to be already familiar with the fundamentals of Java programming, but does not require any prior experience with grid computing.

The lessons in the tutorial are:

The name in parentheses next to each lesson is the name of the directory under frontier-sdk-home/demo containing the sample application and source code explained in that lesson.

# Lesson 1: Running in Local Mode

## Overview

The Frontier Application Programming Interface (API) allows you to run your program in one of three modes: remote, local, or simulator.

Remote mode is the normal operating mode. In remote mode the application is launched from your local machine, tasks are sent to the Frontier server, the server sends the tasks to provider machines, the provider machines execute the tasks, and the results are sent back to the application through the Frontier server.

Local mode performs the above operations entirely on the local machine, executing the tasks locally instead of sending them to the Frontier server. Local mode allows you to run tasks directly on your machine, and is useful both for small tasks, and for application development and testing where the full power of Frontier is not necessary. However, local mode is tuned for efficiency and does not attempt to enforce many of the rules which come into play when running remotely, such as disallowing task-to-task communication or limiting available executable code and security.

Finally, simulator mode also runs on a single machine but simulates much of the behavior of the distributed Frontier system by employing multiple JVMs, messaging between tasks, runtime partitions, et cetera, allowing one to ensure that an application behaves as expected before running it remotely.

LocalApp is coded to run only in simulator mode, though it could just as easily be used in local mode. In the next lesson we'll see how to make this application run remotely.

## Files

The files used in this example are:

File Description
local/src/LocalApp.java Application code to start the tasks and collect their results
local/src/LocalTask.java The task code that actually does the work

You should have a copy of the source files available while you proceed through the tutorial.

## Application

The LocalApp application computes the squares of a set of numbers. It creates a Frontier job and starts a task for each number to compute that number's square. A listener registered by the application receives the results of the tasks as they complete and prints them to the console. Once all the tasks have completed the application terminates.

### main

Let's start in LocalApp.java. The main method instantiates a LocalApp object and calls launch with the numbers whose squares we're going to compute. The launch method does all the work of the application.

      public static void main(String[] args) throws Exception {
int[] inputValues = new int[] { 5, 7, 83 };

LocalApp localApp = new LocalApp();
localApp.launch(inputValues);
localApp.waitUntilComplete();
localApp.closeSession();
}


In the constructor, we invoke the createSession method which in turn creates a session manager. This is our first use of the Frontier API.

          manager = new SimulatorSessionManager();


The type of session manager created dictates the mode the library is operating in for all jobs and tasks associated with that session manager. SimulatorSessionManager is a version of SessionManager which uses simulator mode to execute all tasks locally in an environment meant to mimic remote execution. For true remote sessions we'll use a different type of SessionManager but most of the rest of the API used here will still apply.

Note that we could modify LocalApp to use full local mode simply by changing this single line to:

          manager = new LocalSessionManager();


Once the session is established, we call LocalApp.launch to create a job, populate it with tasks, start those tasks, and register a listener. We then call waitUntilComplete which simply blocks until all tasks have returned results. In the meantime, the LocalApp -- which has been registered as a listener -- will wait for tasks to send back events, reporting results as they're obtained, and noting when all tasks have been run to completion.

Shutting down the application requires several steps. First we call LocalApp.remove which deletes the job -- including all tasks it contains -- via Job.remove. Next we call LocalApp.closeSession, which shuts down the connection to the session manager by calling SessionManager.destroy.

### launch

Now that we have a session we can create a job. Jobs have a set of attributes associated with them, each of which is a key string associated with a string value. We'll visit attributes in more depth later; for now we'll simply provide an empty map.

        job = manager.createJob(new HashMap());


Next we need to provide the job with the executable code for the task. The task implementation itself -- LocalTask.class, as described below -- plus any Java classes needed by the task (not including those provided by the task runtime environment, notably the Java standard library plus the relevant portions of the Frontier SDK, such as com.parabon.runtime and com.parabon.client.SerializableTaskContext, etc) must be provided explicitely in a set of jar files associated with each task. Those jar files are described as "executable elements", and can be added directly to either individual tasks or, if they're shared amongst multiple tasks, as most usually are, they can be added to the job and merely marked as 'required' for the task, as described later. For now, we'll merely create job-level executable elements containing the jar with our LocalTask.class.:

      File demoDirectory = new File(System.getProperty("demo.home", "."));
File taskJarFile = new File(demoDirectory, EXECUTABLE_ELEMENT_FILENAME);
try {
}
catch (IOException e) {
System.exit(0);
}


The "executable element ID" (taskJarID) is saved so that we can use it later to refer to this new executable element.

We're now able to create tasks -- but first, a little housekeeping. First, we loop through the set of input values which will each be passed to a single task and take note of what will soon be the 'pending tasks'. This will let our listener know what it's listening for and, specifically, when all the tasks we've started have completed. Then, we register the LocalApp object itself as a listener to the job's tasks, so that we'll be sent the results and other status information for all tasks in the job. We do this before starting any tasks, so that we won't miss out on any events that might be sent between starting tasks and registering a listener afterwards.

        for (int inputValue : inputValues) {
}


Finally, we're ready to create and start the tasks themselves. We start by creating a SerializableTaskSpec, which serves as a description of a single task. We create only one task spec object and reuse it for each task simply for efficiency, but we could just as well create one task spec per task.

        SerializableTaskSpec taskSpec = new SerializableTaskSpec();


Next, we add our single executable element, created above, as a "required executable element" for each task, so that Frontier knows that each task needs a reference to that executable element. If we didn't do this, then even though that executable element exists in the job, the tasks would simply ignore it. By specifying it explicitely, this allows us to put many elements in the job, and pick-and-choose which are needed for each task, hence allowing us to have many tasks performing different functions in a single job. Note that, similarly, this would not work if the executable element ID we pass in wasn't associated with an element in the job -- in this case, our task will simply come back with an error later on.

        taskSpec.addRequiredExecutableElement(taskJarID);


Now we loop through out input values so that we can start one task for each. For each task, we first specify a task attribute. Task attribute -- like job attributes discussed above -- are a set of string -> string mappings. We can define any key we want, and associate it with any string value, though we keep both reasonably small for bandwidth reasons -- it would be inefficient to store a large deal of data in this manner. Specifically, what we want to record here is the input value associated with each task, so that when we hear back from that task, we can identify the task and what it was working on. Alternately, we could've created a unique ID for each task and used a local data structure to associate that ID with the input value; it's entirely up to the developer to decide what identifying information to store in attributes, and how to use it.

        for (int inputValue : inputValues) {


Next we do two important steps in rapid succession. Don't let the brevity fool you -- these two steps form the heart and soul of the Frontier API. The first step is creating the task object itself. That is, we instantiate an object which implements the SerializableTask interface, and pass to it any parameters we need to define the task. This can be as involved a process as is necessary, though here, we only need a single parameter to define a task -- the 'input value' -- and so we just pass it in the constructor. The actual class we're instantiating is completely up to us; the LocalTask used in this demo is explained in more detail further down, as is the SerializableTask interface it implements.

The second step is to pass that task object to SerializableTaskSpec.setTask. This method will serialize that task object and store it internally; the resulting byte-stream, containing the 'frozen' task object, will be passed around from client to server and finally to a Frontier compute engine where it will be de-serialized and reconstituted into a mirror of the SerializableTask we originally passed in to setTask. From there, it will be executed, as we'll describe later; for now, we're merely sending the original task object in to be frozen and stored in the task spec. Note that task instances can be reused and sent to multiple taskSpecs -- once we call SerializableTaskSpec.setTask with a given task object, we keep ownership of that object and can continue to change it without affecting the contents of the task spec, which serialized and recorded the contents of the object we passed it and now no longer stores a reference to the object itself. We create one object per task here simply because the task object is so small it's not worth keeping across iterations.

          taskSpec.setTask(new LocalTask(inputValue));


Finally, we add the task to the job and start it. When we add the task to the job, it's merely stored, dormant, but we receive back a TaskProxy which is now our single conduit of communication with and control of that task (note, however, that we can get a reference to that TaskProxy via other means at a later time if we wish -- for instance, through a task event or directly from the job). We could use this TaskProxy to remove the task at a later time, to register a listener for events produced by that task (which we do not need to do simply because we've already registered a job listener which will receive events from all tasks in that job, which is similar to registering a listener for each task individually), or get the set of attributes we'd associated with the task when creating it. However, what we want to use that TaskProxy for now is to start the task, letting the system know to begin execution as soon as possible. This means starting it in another thread (or queuing it until other tasks have finished) in local mode, or, in remote mode, sending it to the server and letting the server know that it's ready to be sent down to a compute engine.

          TaskProxy task = job.addTask(taskSpec, taskAttributes);
}


After the tasks and listener have been created, control returns to the main method.

### Listener

In addition to providing the overall application structure and control, the LocalApp class also serves as a listener for task events. Not that it is not required that the listener by the main application class; in fact, most complex applications will use a separate class (or classes) altogether to serve as task event listeners. As we've registered the LocalApp instance as a job listener, we'll receive events from all tasks in that job; the types of events we'll receive is dictated by which listener interfaces we implement, discussed further below. The goals of this particular listener are twofold: first, report task results, progress, and errors; and second, keep track of when tasks have completed so that we know when all pending tasks have finished. This latter goal allows the waitForComplete method to return and for the application to subsequently clean up and shut down.

As a task listener, we're required to implement TaskEventListener, but this interface itself does nothing for us. Instead, what we want to do is implement one or more sub-interfaces: TaskProgressListener, TaskIntermediateResultListener , TaskResultListener, and/or TaskExceptionListener . Alternatively, we could implement UniversalTaskListener in order to have all events sent to us via a single method and let the listener sort through, but more often than not these explicit single-event-type interfaces are easier to use. We don't sent progress values -- a sort of generalized "percentage complete" report used to help provide a user-visible gauge of how far along a task is in its work, as well as other uses -- as our task is very short; nor do we use intermediate results, a fancier type of status report sent from the task, with contents defined by the task developer. Hence, we'll leave those two interfaces out, and implement only the two most important: TaskResultListener, and TaskExceptionListener.

    public class LocalApp implements TaskResultListener, TaskExceptionListener {


In all cases, each interface adds a single method to which the corresponding event type will be sent. In all cases, the task events implement at least the TaskEvent interface, which provides a means to access task attributes in order to identify the task; access to a TaskProxy from which we can control the task and add listeners, stop the task, etc; and get basic information about the task's current state and progress.

First, let's take a look at the resultsPosted method, inherited from the TaskResultListener, which is sent the final results of a given task via a TaskResultEvent object. The first thing we do in this method is to use LocalApp 's getInputValue method to look at the task attributes for the task which produced the event in question. Specifically, this method will pull out the value of the "InputValue" attribute created in our launch method.

      private int getInputValue(TaskEvent event) {
return Integer.parseInt(
}

int inputValue = getInputValue(event);


Next, we want to get the actual task results. The task results are the value returned from the task's run method (described later) -- after those results have been serialized, sent around as a byte-stream, and de-serialized. In our case, LocalTask.run returns an Integer, so that's what we expect to receive here. After obtaining the results, we simply print them to the console; most applications would of course do something more meaningful with task results.

        int result = (Integer)event.getResultsObject();

System.out.println("The square of " + inputValue + " is " + result);


Finally, the fact that we've received the task's final results implies that the task is complete. We'll call our taskComplete method with the input value associated with this task, and taskComplete will first record the fact that this particular task is no longer 'pending' and, if this was the last active task, notify waitForComplete that the entire job appears to have been completed.

        taskComplete(inputValue);
}

private synchronized void taskComplete(int inputValue) {
notifyAll();
}
}


Next we'll take a look at exceptionThrown, inherited from TaskExceptionListener, which receives a TaskExceptionEvent. This listener is notified when an error occurs while running this task. This exception could be the result of an Exception or other Throwable being thrown from the task's run method, an error during de-serialization, a missing executable element, a security violation, or some other persistent error caused by a bug in the task or task specification; these are referred to as "user exceptions", and the TaskExceptionEvent.isUserException method will return true in these cases. Alternatively, the error may have been caused by a systemic error: for instance, a corrupted installation of the engine running the task, or insufficient memory or other resources. When possible, in remote mode, the Frontier server will attempt to rectify these problems by re-running the task on another engine; however, if this is not feasible or does not resolve the issue -- or if the task is running in local or simulator mode, as it is here -- such errors will be reported back to the client. All errors have an associated description encoded as a string; if the error was caused by a thrown Exception, this description will generally contain the exception's message, stack trace, and cascading nested exceptions as applicable. The exception event will also, when possible, contain codes giving more precise information about the categories of specific error conditions, such as "out of memory" or "security violation"; these are returned by TaskExceptionEvent.getCode and .getSubcode.

Our exceptionThrown method is similar to our resultsPosted method. First, the task is identified via its attributes. Second, the error condition is printed to the console. Third, because an exception means that a task is effectively (if unsuccessfully) complete -- as all tasks end with either results or an exception, but never return both or continue executing after either have been sent -- we'll record the task's completion so that the application can terminate when everything is complete.

      public void exceptionThrown(TaskExceptionEvent event) {
int inputValue = getInputValue(event);

System.out.println("Exception (" + event.getCode() + ")"
+ " while working on input value " + inputValue);
System.out.println(event.getDescription());
System.out.println();

}


Now that we've seen how to start tasks and receive their results let's look at the task itself.

The code for the task used in this lesson is in LocalTask.java . It's a simple task that takes a single integer as input, squares it, and sends back the result. Of course, using the Frontier system to perform such a simple task is overkill, but we don't want to concentrate on a complex task here, merely the means to execute it; suffice to say that the work done by the task will in general be significantly more complex and longer-running than squaring a number, but the basic principles of defining a task and returning results remain the same.

Although a task may involve dozens of arbitrary Java classes, each task is ultimately defined by a single entry-point class (or, rather, an instance of this class). This class must implement com.parabon.client.SerializableTask (or com.parabon.runtime.Task, but that's a topic for another lesson), which in turn extends the java.io.Serializable interface. Hence, implementing a task first means following the rules of the Serializable interface (including the option of using Externalizable instead); this common part of the Java platform is described by many pages and books, so we won't go into it here beyond noting that one should be careful when making a task serializable in order to avoid serializing either large or unnecessary pieces of data (a separate mechanism for sending data to tasks is described in Lesson 3, Using Data Elements) or classes which are not included in the executable elements associated with a task; for instance, the transient keyword should be used liberally when applicable, and in some cases a task may even with to use Externalizable for finer-grained control of task serialization. When serialization seems entirely inappropriate for specifying a given task, one may wish to switch to the 'flat' or byte-stream mode of the API, based on com.parabon.runtime.Task, which avoids the use of Serializable altogether, giving a task fine-grained control of the bytes used for specifying both tasks. The remainder of this lesson, however, will assume the use of SerialiableTask.

LocalTask is straightforward and contains only a single member variable: inputValue. Given this simple data structure, it requires very little to make it properly Serializable, though it does include a serialVersionUID as recommended for Serializable classes.

    public class LocalTask implements SerializableTask {
private static final long serialVersionUID = -1;

private int inputValue;


The task developer should keep in mind that task instances exist in two places: on the client side and on an engine. In the latter case, the task is being specified, and so the constructor and whatever other mutator methods are required to set up a task and it's parameters should be included here. How this is done is up to the task developer, but it should be kept in mind that little work should be done on this side, and extraneous initialization which could be just as easily be performed on the engine side should be avoided, both to ensure that the task creation is as efficient as possible and does not become a bottleneck, and to keep the size of the serialized task small by avoiding predigested, redundant pieces of data. One exception to this guideline is when precomputation is fast and reduces the size of a task -- for instance, when only a subset of a larger data structure is actually needed for the task to perform its work.

In the case of LocalTask, the only client-side interface is the constructor, which accepts an integer to use as the inputValue parameter for the task.

      public LocalTask(int inputValue) {
this.inputValue = inputValue;
}


The other face of a task is the runtime interface. After the task is defined by the client application, serialized, sent down to an engine, and de-serialized, then the methods in the SerializableTask interface become important. At this point -- once the run method is invoked -- the task has been 'fully baked', so to speak, and is free to perform further initialization, knowing that it's parameters have been fully specified and it can afford to use both more computation and more data. This division between serialized vs. started tasks is softened by the introduction of checkpoints, but these will be dealt with in detail in Lesson 4, Using Checkpoints.

SerializableTask contains only two methods: run and stop. run is the first method that's called on a task after it's de-serialized, and that's where a task does its work, returning the final results when complete. In the case of LocalTask, its work is simple: square inputValue and return an Integer representing the result. As the result isn't strongly typed -- if the client application expects a different type than the task returns, it will generally result in an exception on the client side when the cast is made -- it's a good idea to be very explicit about the type being returned, for instance tightly specifying the return type of run rather than using the broader type of Serializable.

      public Integer run(SerializableTaskContext context) {
return (Integer)(inputValue*inputValue);
}


Note that run takes a single parameter of type SerializableTaskContext. This context may be used to communicate with the runtime, performing such operations as reporting progress and intermediate results, obtaining data (as described in Lesson 3, Using Data Elements), logging checkpoints (as described in Lesson 4, Using Checkpoints), and other operations. None are used in LocalTask, so the context parameter is never referenced.

The other method in the SerializableTask interface is stop. The concept of this method is simple: if the engine needs to temporarily (or permanently) stop executing a task for any reason, such as when the engine's compute resources are temporarily needed by a user, it invokes this method to request that the task shut down gracefully. This gives the task the opportunity to register a last checkpoint, send back one last intermediate result, or do other cleanup before being temporarily put on a shelf, likely to be restarted from the last checkpoint at a later time, and then to exit gracefully from the run method by throwing a TaskStoppedException . The task is under no obligation to actually do anything at all based on this call, it's merely a courtesy from the engine; if the task doesn't shut down -- or doesn't shut down quickly enough -- the engine will usually stop the JVM or otherwise forcefully shut down the task. Similarly, there are no guarantees that the engine will actually call this method before shutting the task down.

Another means for a task to accomplish the same thing as the stop method is to periodically check the value returned by SerializableTaskContext.shouldStop. For some tasks, this mechanism is more convenient than being interrupted by the stop method -- for instance, it could be checked each time through a loop. Both provide the same information from the engine, however, so it's up to the task which to take advantage of, if either.

As the LocalTask is very short-running and doesn't have any loops, it wouldn't be meaningful to have it be able to stop; hence, LocalTask has an empty stop method.

      public void stop() {
// Stopping not supported
}


## Running the Application

We now have a complete and functional Frontier application. You can run the application as show below:

• UNIX or Linux:

cd frontier-sdk-home/demo/local
./local.sh

• Microsoft® Windows®:

cd frontier-sdk-home\demo\local
local.bat

The application's output should look like this:

    Connecting to session manager
Connected to simulator
Starting task to compute the square of 5
Starting task to compute the square of 9
Starting task to compute the square of 100
Waiting for results...
The square of 5 is 25
The square of 9 is 81
The square of 100 is 10000


Note that task results are received in a random order, due to variations in the scheduling of the tasks.

# Lesson 2: Running in Remote Mode

 Note: You must complete the additional Frontier SDK configuration described in Remote Mode Setup before your application can run in remote mode. If you have not already done so, perform this step before continuing with this lesson.

## Overview

In Lesson 1 we saw how to create and run a simple Frontier application on our local machine. We'll now extend this application to make it capable of running either on our local machine, or remotely on the Frontier platform.

This lesson will also show you how to implement the Launch-and-Listen design pattern used by Frontier applications. In Launch-and-Listen, an application launches its tasks on the Frontier server and then often exits without waiting for the task results. At a later time, the application is run again to obtain any pending results from the server. The application can be run repeatedly until all results have been obtained for the tasks.

Without using Launch-and-Listen, an application would have to stay connected to the Frontier server until all its tasks have completed. For a complex Frontier application, this might be hours or days, and if the application was terminated during this time (user logged off, system crashed, etc.), the task results would be lost.

## Files

The files used in this lesson are:

File Description
remote/src/RemoteApp.java Application code to start the tasks and collect results
remote/src/RemoteTask.java The task that actually does the work

## Application

The RemoteApp application performs the same calculations as the LocalApp application in the previous lesson: it computes the squares of a set of numbers. Depending on the command line flags passed when the application is run, the RemoteApp application will do one or more of the following:

• Create a Frontier job and start tasks to compute the squares of the numbers.
• Connect to an already-existing Frontier job (unless the job was launched in the same invocation) and monitor task status and results.
• Delete the Frontier job and its tasks from the Frontier server.

You would normally run RemoteApp a first time to create the job and tasks, then run it again to obtain the task results. Once all task results have been received, you would run the application one last time to delete the job from the Frontier server. Multiple flags may be used to do more than one of these operations in sequence -- for instance, launching a job and immediately starting to listen for results, or listening for results and deleting the job once all results have been obtained.

### main

Let's start with the RemoteApp.main method. The method starts by determining what operation should be performed based on the command line arguments.

        boolean launch = false;
boolean listen = false;
boolean remove = false;

if (args.length == 0) {
usage();
System.exit(1);
}
for (String arg : args) {
if (arg.equals("launch")) {
launch = true;
} else if (arg.equals("listen")) {
listen = true;
} else if (arg.equals("remove")) {
remove = true;
} else {
usage();
System.exit(1);
}
}


Next we create a new RemoteApp instance, similarly to LocalApp but using a "job name" derived from an environment variable; more on this later. During the RemoteApp constructor, a connection to the remote server is initiated -- more on this later, as well.

        String jobName = System.getProperty("jobName", DEFAULT_JOB_NAME);
RemoteApp remoteApp = new RemoteApp(jobName);


Following this, we perform the operation specified by the command line arguments. We either create a new job or connect to an existing one; optionally listen for results and wait for completion; and finally, optionally stop that job and remove it from the server.

        int[] inputValues = new int[] { 5, 9, 100 };

if (launch) {
if (remoteApp.findJob()) {
System.err.println(
"Job of type " + JOB_TYPE + " with name " + jobName +
System.err.println("Use the \"remove\" mode to remove the job.");
System.exit(1);
}

remoteApp.launch(inputValues, listen);
} else {
if (!remoteApp.findJob()) {
System.err.println(
"Job of type " + JOB_TYPE + " with name " + jobName +
" does not exist.");
System.err.println("Use the \"launch\" mode to create a new job.");
System.exit(1);
}
}

if (listen) {
if (!launch) {
}
remoteApp.waitUntilComplete();
}

if (remove) {
remoteApp.remove();
}


This main method illustrates a relatively straightforward version of the launch-and-listen paradigm. One may launch a job by issuing RemoteApp launch, gather results via RemoteApp listen, and remove the job after completion -- or before -- via RemoteApp remove. The only long-running step in this sequence is the "listen" stage, and if the user quits the application -- or suffers a crash, loss of power, etc -- during this stage, he merely needs to restart the application to continue monitoring the tasks' progress. However, more complex applications need only follow the spirit of this sequence, not the letter; for instance, they may launch tasks, listen to progress, and stop jobs explicitely within a single session, re-attaching to existing jobs only as an exception to recover from a crash or similar event.

### createSession

The createSession method, invoked from the constructor, differs from its counterpart in LocalApp in two important ways.

          manager = new RemoteSessionManager();
manager.reestablish();


The first difference is the creation of a RemoteSessionManager rather than a LocalSessionManager; this kicks off a session in 'remote' mode, initiating a connection to the Frontier server and performing all subsequent interactions through that SessionManager -- such as creation of jobs and tasks -- in the context of that server, using a message transfer protocol to submit jobs and tasks, register listeners, obtain status and results, and perform other queries.

The second difference is the fact that immediately after creating the session, the createSession method calls RemoteSessionManager.reestablish. This method will communicate with the server to obtain a list of the currently running jobs. Before calling this method, the RemoteSessionManager didn't know about any jobs except those that were created by the current machine during this session -- that is, none. reestablish will block until the latest list of jobs is returned by the server.

### launch

The launch method does the work of creating a new job, populating it with tasks, and submitting it to the server. The code to do this is nearly identical to its counterpart in LocalApp, because the API for local and remote modes is, for most purposes, exactly the same; this makes it easy for a single application to function in either mode, simply by 'flipping a switch'. The RemoteApp version of the launch method differs in only two ways: first, it adds an addListener parameter, so that a job listener won't be added unless the app is in 'listen' mode. The second change is more significant: the specification of job attributes. Technically, job attributes could be used in local mode as well, they simply aren't as necessary. Job attributes are similar to task attributes, and in fact use the same mechanism. Here, we add three attributes; none are specifically required and, in fact, none actually do anything special, they're merely used to help identify a running job when we wish to re-attach with a new session of the application -- or a different application, for that matter, such as the "job listener" demo.

• JobType: Used to help identify the application which created the job and give a rough indication of what the job is intended to do. Although nothing in the Frontier API requires the use of this attribute, it is often used by convention.
• JobName: Used as a unique identifier for a job. In the case of this demo, this attribute merely helps ensure that multiple redundant jobs aren't started by accident, as described below; however, more complex applications might use this identifier to, for instance, associate a job with local resources such as log files, results storage tables in databases, et cetera.
• InputValues: This attribute gives some specific information about the parameters of this job instance -- specifically, listing the values used as input. New sessions which re-attach to this job later can use this attribute to know what the job is working on and, significantly, know it is complete when results for all input values have been obtained. This attribute is merely an example of how arbitrary, small pieces of information, which may be useful later in reestablishing a connection with a job, may be stored in attributes.
        StringBuilder inputValuesAttribute = new StringBuilder();
for (int inputValue : inputValues) {
if (inputValuesAttribute.length() != 0) {
inputValuesAttribute.append(' ');
}
inputValuesAttribute.append(inputValue);
}

Map jobAttributes = new HashMap ();
jobAttributes.put(JOB_TYPE_ATTRIBUTE, JOB_TYPE);
jobAttributes.put(JOB_NAME_ATTRIBUTE, jobName);
jobAttributes.put(JOB_INPUT_VALUES_ATTRIBUTE,
inputValuesAttribute.toString());
job = manager.createJob(jobAttributes);


### findJob

The findJob method attempts to locate and connect to an existing job on the server with the same type and name attributes (as described above), returning true if such a job was found. This is used in 'listen' and 'remove' modes -- after reestablishing the list of jobs -- in order to locate the job we wish to listen to or remove. It is also used in 'launch' mode in order to ensure that a matching job doesn't already exist on the server.

      public boolean findJob() {
assert(job == null);

for (Job currJob : manager.getJobs()) {
String currJobType = currJob.getAttributes().get(JOB_TYPE_ATTRIBUTE);
String currJobName = currJob.getAttributes().get(JOB_NAME_ATTRIBUTE);
if ((currJobType != null) && currJobType.equals(JOB_TYPE) &&
(currJobName != null) && currJobName.equals(jobName)) {
String[] inputValueStrings =
currJob.getAttributes().get(JOB_INPUT_VALUES_ATTRIBUTE).split(" ");
for (int i = 0; i <inputValueStrings.length; i++) {
int inputValue = Integer.parseInt(inputValueStrings[i]);
}

job = currJob;
return true;
}
}

return false;
}


### closeSession

The closeSession method in RemoteApp is functionally similar to that of LocalApp. However, it's worth noting what goes on behind the scenes. Specifically, when RemoteSessionManager.destroy is called, it first waits until all communication with the Frontier server has completed. As the Frontier SDK uses asynchronous messaging and queues up both outgoing and incoming messages, simply exiting the application could cause unexpected behavior -- for instance, even though task.start may have been called on one or more tasks, the server may not have yet been sent the messages which will actually create and start these tasks. Simply exiting may cause some unknown number of messages to be 'dropped on the floor'; hence, it is generally very important to shut down a session cleanly via RemoteSessionManager.destroy before exiting an application (for instance, via System.exit or an uncaught exception).

The job listener functionality in RemoteApp is identical to that of LocalApp. However, it's worth noting three subtle but important difference in listener functionality between the two modes.

First, in remote mode, adding a listener isn't just a simple, passive operation; rather, it informs the server that event messages should be sent. Hence, it makes sense to add listeners only when they're actually needed, and then add them in the narrowest possible context. For instance, if an app cares only about task progress, it should only add a TaskProgressListener, not a TaskResultListener or a UniversalTaskListener, as the latter two will result in full results being sent over the network even though they're not needed.

Second, when a listener is first added, the server will generally create a 'latest status' event if applicable for each task (meaning, in the case of a job listener, all tasks in that job). That means that the latest progress will often be sent for started tasks, intermediate results may be sent down if the task has posted any, and -- most significantly -- for completed tasks, either results or exceptions will be sent down to a registered result/exception listener. This is not necessarily the case for all new listeners, however, just the first applicable listeners in a new session -- that is, adding a TaskResultListener a few minutes after adding a UniversalTaskListener will not generally result in another "latest status" event being generated for the benefit of the new listener. This combined with the first point above mean that one should be careful about when and where listeners are added in an application.

Third, unlike local mode, developers should keep in mind that not all events will be make it back to the listeners. Multiple progress or intermediate result events may be thrown on the floor by the server to conserve bandwidth, for instance, and checkpoint events generally will not be sent at all -- in fact, checkpoints are rarely sent to the server from engines. The rules governing which events are and are not guaranteed are laid out in the Frontier API Documentation.

The rest of RemoteApp is similar to LocalApp.

RemoteAppTask is identical to LocalAppTask - a properly-written Frontier task requires no changes to run in remote mode.

## Running the Application

Now it's time to run the application.

• UNIX or Linux:

cd frontier-sdk-home/demo/remote
./remote.sh launch
./remote.sh listen

• Microsoft® Windows®:

cd frontier-sdk-home\demo\remote
remote.bat launch
remote.bat listen

This will start the tasks and then display the latest task results. We can run the application with the listen flag any number of times. When we're done we destroy the tasks and the job by doing:

• UNIX or Linux:

cd frontier-sdk-home/demo/remote
./remote.sh remove

• Microsoft® Windows®:

cd frontier-sdk-home\demo\remote
remote.bat remove

If we run RemoteApp with the remove flag twice in a row, we'll receive an error indicating the job does not exist.

We can also combine flags to perform multiple modes in sequence -- for instance, "launch listen" will launch a new job and immediately start listening to results.

You'll see a couple of minor differences in the way the application operates when running remotely, compared to the local-mode operation of LocalApp. When the application starts, you'll be prompted for the Frontier user name and password you set up when you ran registered for a Frontier account (see Remote Mode). You may also have to wait a few minutes for your application to receive task status events. Task status events are not available until the Frontier server has scheduled the task on a provider.

You've now run your first application remotely on the Frontier platform. The rest of the examples in the tutorial will run in both local and remote mode, but that option is described in the next lesson.

# Lesson 3: Using Data Elements

## Overview

Applications need data. While we've used hard-coded input values in our examples so far, a real application normally requires data from external sources such as files or databases. In this lesson we'll learn how to use data elements to supply input data from a file to a task.

Data elements are the mechanism provided by the Frontier platform to pass data from files and other data sources to tasks. Because tasks execute in a restricted JVM "sandbox" on remote systems, they cannot access files or other data sources directly. A Frontier application must create data elements for data sources required by its tasks, and submit these data elements to the Frontier server. When a task is scheduled to a remote system for execution, the Frontier runtime will retrieve all required elements from the Frontier server, and make them available to the task as needed.

## Files

The files used in this lesson are:

File Description
data/src/DataApp.java Application code to start the task and collect its results
data/src/DataTask.java The task that actually does the work
data/DataApp.data The input data file used by the application

## Application

The DataApp application computes the mean and standard deviation of a set of integers. The input numbers are contained in the file DataApp.data, one number per line. The application constructs a data element from this file and creates a single task that reads this data, then computes the mean and standard deviation of the values.

Data elements are similar to executable elements, except that instead as being added to the JVM's classpath during execution on an engine, they are instead shipped to the engine as pure data and may be accessed directly by the task. Adding a data element within a client application is very similar to adding an executable element. In the case of DataApp, the launch method creates a data element for the input data file in the job, receiving back a String containing a data element ID with which to identify the new data element.

        File dataFile = new File(demoDirectory, DATA_FILENAME);


In addition to File objects, data elements can be constructed from any class that implements the com.parabon.io.DataWrapper interface (such as com.parabon.io.ByteArrayDataWrapper). This lets you create data elements from arbitrary input sources like arrays in memory, or database query results.

There are several ways for the task to access the data element. The most straightforward is to simply pass the data element ID to the task, which will then store it internally and, during task execution, obtain a DataBuffer with which to access the contents of the file (the DataBuffer interface itself will be discussed later) by calling SerializableTaskContext.getDataElement with the data element ID. When using this mechanism, the system must also be notified that a particular data element is required for a task by calling TaskSpec.addRequiredDataElement with the data element ID while the task is being created.

However, a more common and simpler mechanism is available which relies on some behind-the-scenes magic to simplify this process; this is the mechanism used by DataApp. During task creation, a DataElementProxy object is created and passed the data element ID; this object in turn is passed to the task. The DataElementProxy implements DataBuffer, and so -- once the task starts executing -- it can be used to access the data element directly, just as if it had been returned from SerializableTaskContext.getDataElement. This removes the need to store the data element ID in the task, to call SerializableTaskContext.getDataElement, or even -- because the system detects the DataElementProxy during task serialization and treats it specially -- to mark the data element as required via TaskSpec.addRequiredDataElement.

Hence, our DataApp is able to send the data element to its single task during task creation, as in the two lines below, and the data element contents will automatically be sent along with the task. Note that although DataApp creates only one task, it would be equally feasible to create dozens, each with their own (or even a single shared) DataElementProxy referring to the same data element.

        DataElementProxy dataElementProxy = new DataElementProxy(dataElementID);


One other difference in DataApp versus the RemoteApp from the previous lesson is the means by which the SessionManager is created. Thus far we have hard-coded the SessionManager implementation used by each particular example, but an alternative to this is to use the static SessionManager.getInstance method, which will use the value of the "modelocal", "simulator", or "remote" -- to determine which type of SessionManager to create. This lets us easily switch between modes in relatively small, command-line applications via a switch sent to the JVM -- e.g. "-Dmode=remote". Larger more complex applications would likely continue to create SessionManager instances with explicit types and decide between which type to use depending on, for instance, application preferences specified via a GUI.

## Listener

Because this application has only one task, the listener functionality is much simpler than the listeners in previous examples: its resultsPosted() method prints the results from the task and then sets a flag to indicate that the job is complete when the task is complete. However, the results themselves are slightly different.

As our task is now computing results slightly more complex than the single Integer returned by the tasks in the previous two lessons -- specifically, our new task will need to return two numbers, a mean and a standard deviation -- we in turn need to change the return type from the task. A task can return any Serializable object, as long as the application and task agree on what is being sent. Hence, we'll make a new custom type and add it as a static nested class within DataTask: DataTask.Result. This contains two float fields representing the mean and standard deviation, respectively. Note that it's very important that this is a static nested class (as opposed to a non-static or "inner" class), because otherwise, attempting to serialize the result would also result in the outer class -- in this case, the task itself, DataTask -- being serialized as well. This is a very common source of bugs in serialization in general, and crops up here not only in results but in tasks themselves; hence, a developer should always be wary of such problems when using nested classes.

Using the results, which in DataApp are only used within the resultsPosted event listener method itself, is simple. The only sticking point is to be sure that the type the application is expecting and the type the task is sending are always the same.

      public void resultsPosted(TaskResultEvent event) {

System.out.println("The mean is " + result.getMean()
+ " +/- " + result.getStandardDeviation());
System.out.println();

}


We've seen how the application creates data elements and passes them as parameters to a task. Now let's look at how the task reads data from a data element.

Accessing data via the DataBuffer interface is reasonably straightforward. This interface, intended to provide read- and, in other cases (that is, not including data element access) write-access to a set of bytes. A task can obtain one via several different means, including SerializableTaskContext.getDataElement (as described above) and as a DataElementProxy created on the client side and shipped along with the task. This latter course is how DataTask access its data element: that is, it merely uses its existing data member, set in the constructor. Once a DataBuffer is available, it has a number of different means of access, such as a read method, but perhaps the most common and easiest means to access the data -- and, again, the means used here by DataTask -- is via the getInputStream method, which returns a java.io.InputStream implementation which will read the contents of the DataBuffer . Since this is just like any other InputStream, it can be either used directly or sent to any number of library methods, from XML parsers to image readers. Here, we create a java.io.BufferedReader so that we can read the contents line-by-line.

      public Result run(SerializableTaskContext context)


Note: We explicitly specify the character set of the input stream when we create the BufferedReader instead of relying on the JVM's default character set. Since the default character set of the JVM used to run the task may differ from that of the JVM used by the application, explicitly setting the character set ensures we get a consistent interpretation of the input data.

Next, we get to the meat of the task's work: iterating through the values in the data stream and computing their sum and the sum of their squares, which will be used later to determine the mean and standard deviation.

        int numValues = 0;
long sum = 0;
long sumOfSquares = 0;
boolean done = false;
while (!done) {
if (context.shouldStop()) {
}
if (inputLine == null) {
done = true;
} else if (inputLine.trim().length() > 0) {
int value = Integer.parseInt(inputLine);
numValues++;
sum += value;
sumOfSquares += value * value;
}

// We don't know how many values there are, so we can't report
// an actual fraction-complete, the best we can do is report that we're
// a certain amount done.  Hence, by convention, we start our progress
// at a value over 1.0.
context.reportProgress(1+numValues);
}


It is worth noting four points about the above code. First, the task periodically calls context.shouldStop to determine whether the engine has requested that the task shut down gracefully, and if so, throws a TaskStoppedException out of the run method. This is good practice, even if the task doesn't take advantage of the opportunity to log checkpoints or send intermediate results. Second, we report progress when we can; doing so is never necessary, but can be useful to the user (to whom the information could be presented via a progress bar, for instance), as well as for the system (which might schedule the same task on multiple engines, and could keep track of which had gotten further in the task by comparing their progress values). Third, the task doesn't check NumberFormatExceptions from Integer.parseInt, but rather declares them in the signature of the run method; if any such exception is thrown, it will end up being sent back to the client application as a task exception. It's entirely up to the task developer to decide what to do with such exceptions, but resulting in an exception being thrown from run is often a reasonable choice. Fourth, the task reads the lines of data one-at-a-time, performing processing as it goes, rather than reading all data up-front and acting on it. Such structural decisions are left entirely up to the task developer, so that he can make tradeoffs between memory considerations, efficiency, and flexibility. In this case, the data is read in piecemeal merely for simplicity. If we had instead read it into an array first and hence no longer needed the data at all, we'd probably want to release the resources used by the data element's DataBuffer object, data, by first calling DataBuffer.release and then nulling out our reference to it so that it could be garbage-collected. The decision of how and when to read data element contents is further complicated by checkpoints, which are discussed in the next lesson.

Once the data is read and processed, DataTask.run merely needs to construct final results and send them back, using the type agreed upon with the client application -- in this case, DataTask.Result (which was also used above as the return type from run for safety's sake).

        double mean = sum / (double) numValues;
double standardDeviation =
Math.sqrt(sumOfSquares/(double)numValues - mean*mean);

return new Result(mean, standardDeviation);


## Running the Application

DataApp uses the file DataApp.data in the current directory as its input data file. Run the program with the following commands, specifying the -remote flag on the command line if you want to run the application in remote mode:

• UNIX or Linux:

cd frontier-sdk-home/demo/data
./data.sh [-remote] launch
./data.sh [-remote] listen
./data.sh [-remote] remove

• Microsoft® Windows®:

cd frontier-sdk-home\demo\data
data.bat [-remote] launch
data.bat [-remote] listen
data.bat [-remote] remove

In the next lesson we'll extend this application with checkpoints that allow long-running tasks to be halted and then restarted by the Frontier server.

# Lesson 4: Using Checkpoints

## Overview

In this lesson we'll modify the example from Lesson 3 to periodically checkpoint the values being computed by the task, and to resume processing if the task is restarted from a checkpoint.

## Files

The files used in this lesson are:

File Description
checkpoint/src/CheckpointApp.java Application code to start the task and the listener
checkpoint/src/CheckpointTask.java The task code that actually does the work
checkpoint/CheckpointApp.data The input data file used by the application

## Application

As checkpoints are both produced by a task and subsequently used directly on the engine on which they were created in order to restart a task -- and generally are not reported back to the client application -- generally no client-side functionality is involved in using checkpoints; rather, it is merely the responsibility of the task itself to correctly log and recover from checkpoints. As such, CheckpointApp is, aside from class names and the like, identical to its counterpart from the previous lesson, DataApp.

The CheckpointTask class differs from DataTask in three ways. The first is a minor restructuring of the class in order to save some state in a way that it will be stored in checkpoints. The second is its ability to log checkpoints, and the third is the functionality needed to restart from a checkpoint.

First, we should define what a checkpoint is, exactly. Simply put, it is the task itself, serialized into an array of bytes just as it was when first specified. From that perspective, a checkpoint is in fact identical to an original task specification. In fact, from the task's perspective, restarting from a checkpoint is just like starting originally: the class is de-serialized and the run() method is called, which returns results on completion. The only real difference between the two is that in a checkpoint, the work the task has done thus far is encoded in its various fields, using whatever means was most appropriate for the particular task; that is, how to represent a task's state such that it can resume later is left to the task developer, Frontier merely provides the mechanism to save that state. Fortunately, it's also up to the task developer to decide when a task is in a state that can be saved, so we can pick points where it's easy to consolidate the task down to a few essentials -- that is, we needn't checkpoint when the task is in the middle of doing something complex.

With that in mind, we take a look at DataTask in order to determine where its state -- the work it has done thus far -- is stored. We can quickly notice that the three quantities actually being computed for most of the task's life -- sum-so-far, sum-of-squares-so-far, and number-of-values-considered-thus-far -- are simply local variables inside the run method. If we wish to start where we left off at any point, these are the three pieces of information we'd really need to know. So, we simply make these class member variables. Now, when the task is serialized out, these values will be as well -- and when we continue later, they will be available once again.

      private int numValues;
private int sum;
private int sumOfSquares;


This is also the point where we want to consider what not to save, in the name of efficiency. Generally, any cached value which can be recreated reasonably easily, at least compared to its size in memory, should be left out of checkpoints. For instance, if we've parsed one of our data elements into a large structure in memory, then generally we wouldn't want to save that in a checkpoint but rather re-create it on our next run. Often, marking such fields as transient is enough to achieve this, but we also need to be sure that the task knows to recreate these caches after restarting.

The next thing we need to do -- the real meat, creating and saving off a checkpoint -- is actually deceptively easy. Simply put, we wait until we're in a state where we can save the state of a task, and call context.logCheckpoint, which will (or rather, might -- the engine usually only saves checkpoints when it thinks it's worthwhile, and just skips the rest), behind the scenes, serialize out our entire task object. This brings up the question of when we shouldn't log a checkpoint. The answer to this, too, is deceptively simple: don't store a checkpoint when you're in the middle of something. In computer science parlance, we might say that we want to ensure that our invariants are valid. For instance, if we're swapping two numbers,

        e.g.
double temp = this.a;
this.a = this.b;
this.b = temp;


then we wouldn't want to save a checkpoint in the middle while this.a == this.b, but would want to wait until we're at least done with that swap. Similarly, in CheckpointTask , we wouldn't want to log a checkpoint after we've updated sum but haven't updated sumOfSquares. So, we'll simply log a checkpoint at the end of our main loop. All we need to do is add a call to context.logCheckpoint:

        while (!done) {
...
context.logCheckpoint();
}


So, how frequently should we call this? There are two answers. Generally, one should call it often -- anything from several times a second to once a minute or so. The task needn't worry about the overhead incurred by calling this method itself, as the engine will keep track of how long checkpoints are taking to save and how long it's been since the last checkpoint, and will decide based on that whether to take advantage of the opportunity to log a new checkpoint. If a task calls logCheckpoint ten times a second but checkpoints are taking about 5 seconds to save, then the engine will generally return immediately from logCheckpoint without doing anything 99.9% of the time, and only once every few minutes will it actually decide to save a checkpoint during one of these calls. However, sometimes a task must do work to even be able to log a checkpoint; for instance, if a task has 5 independent threads, usually all threads must be in a valid, 'checkpoint-able' state when logCheckpoint is called, lest one of them be in the middle of updating some task fields when the checkpoint is made, resulting in a corrupt checkpoint. So, the task would likely need tell each thread it should pause as soon as it's checkpoint-able, wait until all threads have paused, and then call logCheckpoint. This results in lost performance as threads just sit there waiting for other threads to stop, and so it shouldn't be done when the engine isn't even going to bother saving the checkpoint. Another example might be if a task needed to cull down a large working dataset (for example, a sparse matrix) to determine the few values worth saving, which would be wasted work if no checkpoint was to be saved (though we should note that one could also do this in the java.io.Externalizable.writeObject method, if one was using this interface rather than Serializable ). The solution for such situations is to call context.shouldLogCheckpoint, which uses the engine's checkpoint-throttling logic to report whether it's worth saving a new checkpoint yet, and only when this returns true would the task go to the trouble of pausing its threads or doing other checkpoint-preparation work. Or alternatively, if the task wants to override the engine's throttling logic -- for instance, if a task knows it's in a checkpoint-able state now but won't be in such a state again for ten minutes -- the task can force the checkpoint to be saved via context.logCheckpoint(true).

One special time we should make an effort to call shouldLogCheckpoint, assuming we haven't called it very recently and we're in a valid checkpoint-able state, is after the engine requests that the task stop (e.g. if context.shouldStop returns true and before we throw a TaskStoppedException in response). CheckpointTask doesn't do this simply because it has just called logCheckpoint anyway.

The third thing we need to do to support checkpointing often involves the most code, but it's very specific to each task. Simply put, we must recover from a checkpoint. For some tasks, no such explicit recovery is necessary; they can simply start back where they left off. Other tasks may need to recreate cached values and perform other such housekeeping. Some tasks may want to skip initialization phases or jump past work already performed, in order to avoid doing things twice and causing problems in the process. This last is what needs to be done in CheckpointTask. As CheckpointTask reads data values one-at-a-time, performing computation on each, we need to be sure that when we restart from a checkpoint, we skip past values we've already dealt with, both to avoid performing redundant work and, more importantly, to avoid double-counting values and ending up with incorrect results. We could do this by storing an offset through our input stream and skipping past those bytes as soon as we get a new input stream, but that's difficult to do with a BufferedReader . We could instead save the number of values we've processed as part of our state, and discard that many lines from the input. In fact, we already have this information: the numValues field happens to be the same as the number of values we've considered thus far, so we don't need to store another member variable with this information, we simply need to skip numValues values before dealing with new ones. We add this code before the loop:

      int valuesToSkip = 0;
if (numValues != 0) {
// Restarting from a checkpoint, so let's skip through data we
valuesToSkip = numValues;
}

Then in the loop itself, we replace this:
      numValues++;
sum += value;
sumOfSquares += value * value;

with this:
      if (valuesToSkip > 0) {
valuesToSkip--;
} else {
numValues++;
sum += value;
sumOfSquares += value * value;
}


CheckpointTask is, as its name promises, now fully able to produce and recover from checkpoints.

## Running the Application

CheckpointApp is run like our previous examples:

• UNIX or Linux:

cd frontier-sdk-home/demo/checkpoint
./checkpoint.sh [-remote] launch
./checkpoint.sh [-remote] listen
./checkpoint.sh [-remote] remove

• Microsoft® Windows®®:

cd frontier-sdk-home\demo\checkpoint
checkpoint.bat [-remote] launch
checkpoint.bat [-remote] listen
checkpoint.bat [-remote] remove

You have now completed the tutorial lessons. The next part of this document contains overviews of several additional applications included with the Frontier SDK that provide examples of advanced Frontier programming techniques.

# Examples

The following examples are available in the Frontier SDK demo directory. These examples can run in all three modes and are included to demonstrate best practices when developing Frontier enabled applications.

## Job Listener Application

### Overview

This section provides an example of using the Frontier API in a Swing GUI application. In a GUI application, the Frontier API methods should normally be executed in a separate thread so the GUI remains responsive while the API communicates with the Frontier server over the network. The example program here is derived from the "Job Controller" utility, which is an application included with the Frontier SDK that can be used to monitor and manipulate jobs on the Frontier server.

Since this example is over a thousand lines of code we'll focus on the key points of interest in the program.

### Files

The files used in this example are:

File Description
joblistener/src/JobListener.java Main program and Swing GUI
joblistener/src/JobSearcher.java Connects to the server and gets the list of jobs
joblistener/src/RunMode.java Utility class for decoding the task mode into colors
joblistener/src/SwingWorker.java Utility class for running a "background" thread
joblistener/src/TaskChartPane.java Displays task modes as a colorful chart
joblistener/src/TaskFrame.java GUI code to provide internal frame for job details
joblistener/src/TaskLegendPane.java Displays a legend of the task modes and their color representation
joblistener/src/TaskStatus.java Wrapper class for task mode and progress
joblistener/src/BasicWindowMonitor.java Utility class to handle closing the application

### Application

JobListener connects to the Frontier server, displays the list of all client jobs on the server, and allows you to select a job to display detailed information. This is done by selecting a job from the job list panel, then clicking the Show Task button.

Clicking Show Tasks displays a list of tasks in a job from the Frontier server, with additional tabbed panes displaying task attributes and status:

The code segment shown here from JobListener.java uses the SwingWorker class to run the Show Task operation in a background thread. In the construct() method, a reestablish() call is made on the job to populate the job's task information, then finished() is called to create a TaskFrame to display task details for the selected job.

          public void actionPerformed( ActionEvent ev ) {

final int selected[] = _list.getSelectedIndices();

//
// Display the tasks associated with the job in the desktop pane.
//

if ( selected.length > 0 ) {
if ( _jobList.size() > 0 ) {

System.out.println( "Requesting job details from the server..." );

//
// Reestablish the job (this takes a little while).
//

SwingWorker worker = new SwingWorker() {
public Object construct() {
try {
synchronized ( _listResource ) {
Job tmpJob = (Job) _jobList.get( selected[0] );
if ( tmpJob != null ) tmpJob.reestablish();
}
} catch (OperationFailedException ofe ) { ofe.printStackTrace(); };
return null;
}

public void finished() {
String jobName;
Job tmpJob;

synchronized ( _listResource ) {
if ( selected.length > 0 ) {
jobName = (String) _model.get( selected[0] );
tmpJob  = (Job )   _jobList.get( selected[0] );
}
else {
jobName = null;
tmpJob = null;
}
}

//
// Display the task frame on the desktop pane.
//

if ( tmpJob != null ) {
f.setVisible( true );
f.moveToFront();
f.setDesk( _desk );
}
}
};

worker.start();

}
}
}
};


The abstract SwingWorker class is extended here as an anonymous class. Anonymous classes allow you to define a class near the place in the code that it is used, and avoid declaring many trivial classes in your application.

The TaskChartPane, contained in the TaskFrame , tracks the individual task progress in a Map using a TaskProgressListener to regularly display the progress in a chart. The TaskProgressListener is a lightweight task listener that only receives the mode and progress information from tasks. This type of listener is useful if you want to monitor a job that produces large results but do not want to incur the communications penalty of receiving large messages. The TaskChartPane implements the interface of the TaskProgressListener by implementing a progressReported() method. The TaskProgressListener is set up by the following code in the TaskChartPane constructor:


if ( _job != null ) {
}


This causes the progressReported() method of TaskChartPane to be invoked whenever there is new progress information. Listening occurs asynchronously on a separate thread from the Swing GUI. The code segment below shows how the application keeps track of the task statuses in a Map. Note the use of the TASK_ID_DEFAULT_ATTRIBUTE_NAME to uniquely identify each task.

      //
// TaskProgressListener - this method keeps the
//                        task status map up to date.
//
public void progressReported( TaskProgressEvent event ) {
.getAttributes()

if (id != null) {
if (status == null) {
}
status.setProgress( event.getProgress() );
status.setRunMode( event.getRunMode() );

}
}

On another thread, the GUI is requested to update the display every 10 seconds. This is triggered by the run() method. We use SwingUtilities.invokeLater() to make sure the repaint occurs on the Swing dispatch thread:

public void run() {
while ( true ) {
try {
} catch ( InterruptedException ie ) { break; }

//
// Cause the pane to be repainted.
//

Runnable doRepaint = new Runnable() {
public void run() {
_this.repaint();
}
};

SwingUtilities.invokeLater( doRepaint );
}
}


### Running the Application

To run this application do:

• UNIX or Linux:

cd frontier-sdk-home/demo/joblistener
./joblistener.sh

• Microsoft® Windows®:

cd frontier-sdk-home\demo\joblistener
joblistener.bat

## Mersenne Prime Application

### Overview

When a number of the form 2p-1 is prime, it is said to be a Mersenne Prime. This example searches a range of numbers for Mersenne primes using the Lucas-Lehmer test for Mersenne primality. The example illustrates how to use time-based checkpointing to trigger checkpoints using a timer rather that performing a checkpoint after a given number of iterations have completed.

### Files

The files used in this example are:

File Description
prime/src/Prime.java Application code to launch the tasks
prime/src/PrimeTaskEventListener.java The listener for task results
prime/src/PrimeTask.java Contains the task code that checks if a number is prime

### Application

The Prime application launches a task for each odd number in a range of numbers (since even numbers greater than 2 are never prime) to check if that number is a Mersenne prime.

The Prime application has launch, listen, and remove modes, similar to the tutorial examples.

### Listener

PrimeTaskEventListener listens for task status and completion events from the PrimeTasks. It prints the results of the prime test when a task completes, and terminates the application after all tasks have finished.

PrimeTask.run uses the Lucas Lehmer test to determine whether a given candidate is in fact a Mersenne Prime. This test functions by looping through p iterations, where p is the candidate number being tested. Aside from the candidate value itself, the state stored in the PrimeTask class includes the last iteration number and the current "residue"; these values are sufficient to simply continue the test where it left off, and are in a valid state after any given iteration of the main loop has completed. Describing the mathematics behind the Lucas Lehmer test or the role of the residue are outside scope of this document, but interested readers can easily find more information on this topic on the Internet.

Each time through the main loop, in addition to performing the test itself, the run method sends back progress and intermediate results (consisting of the current iteration number), checks if the runtime has requested that the task stop, and then logs a checkpoint.

      public synchronized Serializable run(SerializableTaskContext context)
BigInteger m = TWO.pow(candidate).subtract(ONE); // 2^p - 1

while (iteration <= candidate) {
residue = residue.pow(2).subtract(TWO).mod(m);
iteration++;

context.postIntermediateResultsObject(iteration / (double)candidate, (Integer)iteration);

if (context.shouldStop()) {
}

context.logCheckpoint();
}

return (Boolean)residue.equals(ZERO);
}


### Running the Application

To run this application execute the following commands:

• UNIX or Linux:

cd frontier-sdk-home/demo/prime
./prime.sh [-remote launch [start-number end-number] ./prime.sh [-remote]listen ./prime.sh [-remote]remove 
•  Microsoft® Windows®: cd frontier-sdk-home\demo\prime prime.bat [-remote launch [start-number end-number] prime.bat [-remote]listen prime.bat [-remote]remove 
 Appendices Appendix A: Using Client-Scope and Global Elements Client-Scope Elements Some Frontier users may find that they launch many jobs that use the same executable and data elements. Each job must resend those elements to the server, resulting in duplicate copies of the elements residing on the server when only one set is necessary. Frontier allows you to create client-scope executable and data elements on the Frontier server, which may be used by tasks in any job created by you. Client-scope elements are created using the upload-element script included in the Frontier SDK. A client-scope element is created by the following commands: UNIX or Linux: cd frontier-sdk-home/bin ./upload-element.sh elementName elementType elementFile Microsoft® Windows®: cd frontier-sdk-home\bin upload-element.bat elementName elementType elementFile where: elementName is a unique name for the element. This name is used as an element ID in your Frontier applications to reference the element. elementType is either data to create a data element or executableJar to create an executable element. elementFile is the pathname of the file to be uploaded. The following example creates a client-scope data element named element1 from the file foo.data: upload-element.sh element1 data foo.data And this example creates a client-scope executable element named element2 from the jar task.jar: upload-element.sh element2 executableJar task.jar In order to use client-scope elements submitted this way, applications simply use the element name provided when the element was created using upload-element just like any other element ID. For example, to pass the client-scope data element named element1 to a task your application would do: myTask.setData(new DataElementProxy("element1")); This example adds the executable element named element2 to a task: myTaskSpec.addRequiredExecutableElement("element2"); Tasks access client-scope elements the same way they do other elements. Global Elements Frontier also supports global elements which are available to any client-not just the client that uploaded the elements. To create global elements please contact Parabon support through our email form at http://www.parabon.com/MyFrontier/support.jsp. Appendix B: Glossary A    B    C    D    E    F   G   H   I   J   K    L   M   N    O    P   Q    R    S    T    U    V    W    X    Y    Z A attributes A set of data that can be attached to both jobs and tasks to help describe and identify their purpose and contents. Attributes are stored in a HashMap<String,String> as name-value pairs.   B C checkpoint The mechanism Frontier uses to capture the state of a task in order to minimize the amount of computational work lost due to interruptions during task execution. client application One of three main components comprising the Frontier platform. The client application is run on a single computer by an organization or individual wishing to utilize Frontier by communicating with the Frontier server. Client Application Programming Interface (API) The portion of the Frontier API with which client applications deal directly when creating, launching, monitoring, and controlling jobs and tasks. The Client API is implemented by the client library and provides a set of functionality used by the client application to communicate with the server. compute engine A software application that turns otherwise wasted CPU cycles into useful computation by working on tasks using a provider node's spare power. compute-intensive Describes a problem or job that involves relatively small amounts of data, but requires a large amount of computation to solve. compute-to-data ratio A ratio of the amount of computation to the amount of data required to solve a problem. Jobs whose tasks are characterized as having a high compute-to-data ratio-i.e., tasks that require heavy computation but contain relatively small amounts of data-tend to process well on Frontier. Examples of compute-intensive jobs include photo-realistic image rendering, exhaustive regression, and sequence comparison.   D data element A black-box chunk of data that is required for the running of a task.   E element A mechanism used to efficiently transport relatively large chunks of binary data required to execute a task. Frontier uses both data and executable elements, which are sent from a client application to the server and directed to provider nodes as required before task execution is initiated. executable element A type of element that provides the Java bytecode instructions necessary to run a task.   F Frontier A massively scalable distributed computing platform that draws on the otherwise unused computational capacity of computers connected to the Internet. Frontier Application Programming Interface (API) The framework behind the Frontier platform that enables developers to create, launch, monitor, and control arbitrary compute-intensive jobs from an average desktop computer. Frontier Enterprise A version of Frontier that draws on the spare computational power of computers connected to an intranet, such as a corporate network enclosed by a firewall. Unlike Frontier, Frontier Enterprise is a fixed-capacity solution whose peak depends on the number and capacity of computers on the network. In addition, Frontier Enterprise supports native code and lower compute-to-data ratios. Frontier server The central hub of the Frontier platform that communicates with both the client application and individual Frontier compute engines. The Frontier server is responsible for coordinating the scheduling and distribution of tasks; maintaining records identifying all provider nodes, client sessions, and tasks; and receiving and storing task progress information and results until the client application retrieves them.   G H I idle Describes the state of a computer that is on but whose processing power is not actively engaged in processing tasks. The Windows® version of the Frontier compute engine can be configured to process tasks during a provider node's idle time. intermediate results Results that are returned by the task before the task is considered complete. Internet distributed computing A distributed processing model in which the idle or spare computational power of computers connected to the Internet is aggregated to create a platform with high-performance capability.   J Java Virtual Machine (JVM) The Java technology that provides a "sandbox" inside which the Frontier compute engine can safely process tasks on a provider node. job The single, relatively isolated unit of computational work performed on Frontier. A job is divided into a set of tasks, with one or more tasks assigned to and processed on a single provider node.   K L Launch and Listen The two-stage methodology client applications typically use for processing on Frontier. Launching a job involves creating tasks and sending them to the server for distribution onto the provider network, while listening involves gathering results and status updates or removing tasks from Frontier. However, both can intermingle in a single session or occur over the course of several sessions with the server. listener A user-created class that implements one of a set of listener interfaces, each of which receives one or more types of events generated by a job or task. local mode A mode of operating a client application in which only resources local to the client machine are used. Local mode provides an efficient virtual session that is useful for testing and debugging purposes.   M N O obfuscation Scrambling all symbolic code information such that no problem-domain information remains.   P Frontier Compute Engine One of three main components comprising the Frontier platform that utilizes the spare power of a provider node to process tasks. progress An indication of how "complete" a task is. Many tasks have a known run cycle, i.e., the task will run for a known number of iterations. Tasks that exhibit this behavior generally indicate progress on a scale of 0.0 to 1.0. This progress can then be translated into a "percent complete" by the application. Other tasks simply report progressively larger values, starting above 1.0. provider An individual who installs and runs the Frontier compute engine to donate or provide his or her computer's spare power to process tasks retrieved from the Frontier server.   Q R reestablishing Obtaining information about the contents and status of released jobs and tasks-most often those created in previous sessions-and attaching them in order to monitor status changes. remote mode A mode of operation that manages a session with the Frontier server, allowing the client application to actually command the resources of the Frontier platform. run mode The phase of execution a task is in at a particular point in time. Possible run modes include unknown, unstarted, running , paused, complete, stopped, and aborted.   S sandbox The restricted, self-enclosed environment inside which the Frontier compute engine processes tasks on a provider's computer. The Frontier compute engine's sandbox is a Java technology that prevents task code from accessing files and interacting with programs on the provider node. session An interaction with the Frontier environment, which can be either with the actual Frontier platform or with resources local to the user running the client application. session manager The Frontier API object responsible for creating and maintaining sessions.   T task The relatively small, independent unit of computational work the Frontier compute engine executes on a single provider node. A job to be performed on Frontier is divided into a set of tasks. Task Runtime API The portion of the Frontier API used to create tasks to run on a provider node. Through this API, task execution is initiated, task parameters are set, data elements are accessed, status and results are reported, and checkpoints are logged. U V W X Y Z 
 
 © 1999-2008 Parabon Computation Inc. All rights reserved.