The Chronos API

I recently blogged about Airbnb’s Chronos job scheduler.  Here I take a look at the Chronos API.

Launch command: Your system must run Mesos and Zookeeper.  Then, you launch chronos via java:

java -cp chronos.jar --master zk://127.0.0.1:2181/mesos --zk_hosts 127.0.0.1:2181

API Access: Chronos provides a RESTful JSON API over HTTP and listens on port 8080 for requests. For example, your Chronos leader may run at a URL such as chronos-node.airbnb.com:8080.

Leader node: Chronos can run on a cluster of multiple nodes, and these nodes automatically elect one node as the leader node. Only the leader responds to API requests, and requests to other nodes  are automatically redirected to the leader.

Listing jobs: you can obtain a JSON-formatted list of jobs through curl and the response will include invocationCount (number of times job completed), executor (auto-determined by Chronos, but will usually be empty for non-async jobs), and parents (for dependent jobs, a list of jobs that must run before this job).  If there is a parents field there will be no schedule field and vice-versa:

curl -L -X GET chronos-node:8080/scheduler/jobs

Deleting jobs: to delete job my_job use this request:

curl -L -X DELETE chronos-node:8080/scheduler/job/my_job

Deleting tasks: Deleting tasks for a job is useful if a job gets stuck. The job name corresponds to the information returned from the job listing request:

curl -L -X DELETE chronos-node:8080/scheduler/task/kill/my_job

Manual job start: You can manually start a job by issuing an HTTP request:

curl -L -X PUT chronos-node:8080/scheduler/job/my_job

Adding jobs: send a JSON hash with the fields Name, Command, and Schedule (in ISO8601 format).  We will explain the details for the json hash next:

curl -L -H 'Content-Type: application/json' -X POST -d '{<json hash>}' chronos-node:8080/scheduler/iso8601

JSON hash: an example of a JSON hash is shown below.  We discuss each component below:

{
  "schedule": "R10/2012-10-01T05:52:00Z/PT2S",
  "name": "SAMPLE_JOB1",
  "epsilon": "PT15M",
  "command": "echo 'FOO' >> /tmp/JOB1_OUT",
  "owner": "bob@airbnb.com",
  "async": false
}

Job schedule: The schedule consists of 3 parts separated by ‘/’:

  1. number of times to repeat the job or ‘R’ to repeat forever
  2. start time of the job, an empty start time means start immediately, such as “1997-07-16T19:20:30.45+01:00”
  3. run interval, such as P1Y2M3DT4H5M6S, see examples below.

The run interval: the following examples illustrate how to specify run intervals:

  • P10M: 10 months
  • PT10M: 10 minutes
  • P1Y12M12D: 1 years plus 12 months plus 12 days
  • P12DT12M: 12 days plus 12 minutes
  • P1Y2M3DT4H5M6S: Period: 1 Year, 2 Months, 3 Days, Time: 4 Hours, 5 Minutes, 6 Seconds

P is required. T is for distinguishing minute and month, when Hour, Minute, Second exists.

Available time zones: The time zone name to use when scheduling the job:

Example time zone: for example, to specify Pacific Standard Time use:

json { "schedule": "R/2014-10-10T18:32:00Z/PT60M", "scheduleTimeZone": "PST" } 

Retry epsilon: If Chronos misses a scheduled run time for any reason, it will run the job later as long as the current time is within the specified epsilon interval. Epsilon must be formatted like an ISO 8601 Duration.

Job owner: the email address of the person responsible for the job.

Async: the async flag specifies whether the job will run in the background or in blocking mode in the foreground.

Add job example: with the hash constructed as described above, send the job schedule request to Chronos:

curl -L -H 'Content-Type: application/json' -X POST -d '{ "schedule": "R10/2012-10-01T05:52:00Z/PT2S",  "name": "SAMPLE_JOB1",  "epsilon": "PT15M",  "command": "echo 'FOO' >> /tmp/JOB1_OUT",  "owner": "bob@airbnb.com",  "async": false}' chronos-node:8080/scheduler/iso8601

Adding dependent jobs: dependent job takes the same JSON format as a scheduled job. However, instead of the schedule field, it will accept a parents field. The parents field lists other jobs which must run at least once before this job will run.

curl -L -X POST -H 'Content-Type: application/json' -d '{dependent hash}' chronos-node:8080/scheduler/dependency

Example dependency job hash: Here is a more elaborate example for a dependency job hash:

{
    "async": true,
    "command": "bash -x /srv/data-infra/jobs/hive_query.bash run_hive hostings-earnings-summary",
    "epsilon": "PT30M",
    "errorCount": 0,
    "lastError": "",
    "lastSuccess": "2013-03-15T13:02:14.243Z",
    "name": "hostings_earnings_summary",
    "owner": "bob@airbnb.com",
    "parents": [
        "db_export-airbed_hostings",
        "db_export-airbed_reservation2s"
    ],
    "retries": 2,
    "successCount": 100
}

Adding docker jobs: docker jobs take the same format as a scheduled job or a dependency job, with an additional container argument.  The container argument requires a type, an image, and optionally takes a network mode and volumes:

curl -L -H 'Content-Type: application/json' -X POST -d '{<json hash>}' chronos-node:8080/scheduler/iso8601

The <json hash> has the following format:

{
 "schedule": "R\/2014-09-25T17:22:00Z\/PT2M",
 "name": "my_docker_job",
 "container": {
  "type": "DOCKER",
  "image": "libmesos/ubuntu",
  "network": "BRIDGE"
 },
 "cpus": "0.5",
 "mem": "512",
 "uris": [],
 "command": "while sleep 10; do date =u %T; done"
}

Dependency graph: Chronos has an endpoint for requesting the dependency graph in form of a dotfile:

curl -L -X GET chronos-node:8080/scheduler/graph/dot

Asynchronous jobs: long-running, synchronous jobs can tie up resources excessively long.  To schedule jobs as asynchronous, set async: true and ensure your job reports its completion status to Chronos.  If your job does not report completion status Chronos report your job as running irrespective of whether it completed or not.

Reporting completion: Reporting job completion to Chronos is accomplished via this API call:

curl -L -X PUT -H "Content-Type: application/json" -d '{"statusCode":0}' chronos-node:8080/scheduler/task/my_job_run_555_882083xkj302

The task id is auto-generated by Chronos. It will be available in your job’s environment as $mesos_task_id.  You need to url-encode the mesos task id to ensure it is not corrupted in the process of sending and processing your request.

Remote executables: There are two forms of specifying commands, as the bash script url-runner.bash and as a URL.  To use the bash script you need to deploy it to all slaves.  To use the URL you need to compile mesos  with the cURL libraries.

Job configuration: The following tables provides an overview of job configurations:

Field Description Default
name Name of job.
command Command to execute.
arguments Arguments to pass to the command. Ignored ifshell is true
shell If true, Mesos will execute command by running/bin/sh -c <command> and ignore arguments. If false, command will be treated as the filename of an executable and arguments will be the arguments passed. If this is a Docker job andshell is true, the entrypoint of the container will be overridden with /bin/sh -c true
epsilon If, for any reason, a job can’t be started at the scheduled time, this is the window in which Chronos will attempt to run the job again PT60S or --task_epsilon.
executor Mesos executor. By default Chronos uses the Mesos command executor.
executorFlags Flags to pass to Mesos executor.
retries Number of retries to attempt if a command returns a non-zero status 2
owner Email addresses to send job failure notifications. Use comma-separated list for multiple addresses.
async Execute using Async executor. false
successCount Number of successes since the job was last modified.
errorCount Number of errors since the job was last modified.
lastSuccess Date of last successful attempt.
lastError Date of last failed attempt.
cpus Amount of Mesos CPUs for this job. 0.1 or --mesos_task_cpu
mem Amount of Mesos Memory in MB for this job. 128 or --mesos_task_mem
disk Amount of Mesos disk in MB for this job. 256 or --mesos_task_disk
disabled If set to true, this job will not be run. false
uris An array of URIs which Mesos will download when the task is started.
schedule ISO8601 repeating schedule for this job. If specified, parents must not be specified.
scheduleTimeZone The time zone for the given schedule.
parents An array of parent jobs for a dependent job. If specified, schedule must not be specified.
runAsUser Mesos will run the job as this user, if specified. --user
container This contains the subfields for the container, type (req), image (req), network (optional) and volumes (optional).
environmentVariables An array of environment variables passed to the Mesos executor. For Docker containers, these are also passed to Docker using the -e flag.

Sample job: here is a complete sample job configuration:

{
   "name":"camus_kafka2hdfs",
   "command":"/srv/data-infra/kafka/camus/kafka_hdfs_job.bash",
   "arguments": [
      "-verbose",
      "-debug"
   ]
   "shell":"false",
   "epsilon":"PT30M",
   "executor":"",
   "executorFlags":"",
   "retries":2,
   "owner":"bofh@your-company.com",
   "async":false,
   "successCount":190,
   "errorCount":3,
   "lastSuccess":"2014-03-08T16:57:17.507Z",
   "lastError":"2014-03-01T00:10:15.957Z",
   "cpus":1.0,
   "disk":10240,
   "mem":1024,
   "disabled":false,
   "uris":[
   ],
   "schedule":"R/2014-03-08T20:00:00.000Z/PT2H",
   "environmentVariables": [
     {"name": "FOO", "value": "BAR"}
   ]
}

Job Management: for large installations it is impractical to manage jobs via the web UI. Instead, you can manage your job configurations in a git repository, make edits, and use it to configure Chronos.  You can use a script called chronos-sync.rb. You can also use a Chronos job to periodically check out your configuration and run chronos-sync.rb. 

Synchronizing jobs: there are 2 steps to loading your configuration.  First, initialize configuration data:

$ bin/chronos-sync.rb -u http://chronos/ -p /path/to/jobs/config -c

Then, synchronize jobs:

$ bin/chronos-sync.rb -u http://chronos/ -p /path/to/jobs/config

You can also force updating the configuration from disk by passing the -f or --force parameter.the Here, configuration data is placed in /path/to/jobs/config. Running chronos-sync.rb will not delete jobs.

For more details, see the Airbnb Chronos Github page.

Advertisements

One thought on “The Chronos API

  1. Anonymous

    I’m curious about when and why one would choose the “async” flag. How does having a blocking task in the foreground use up more resources than an async task in the background? Can you elaborate a bit more on that?

    Like

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s