The Chronos API

I recently blogged about Airbnb’s Chronos job scheduler. Here I take a look at the Chronos API.

Launch command: Your system must run Mesos and Zookeeper. Then, you launch chronos via java:

java -cp chronos.jar --master zk://127.0.0.1:2181/mesos --zk_hosts 127.0.0.1:2181

API Access: Chronos provides a RESTful JSON API over HTTP and listens on port 8080 for requests. For example, your Chronos leader may run at a URL such as chronos-node.airbnb.com:8080.

Leader node: Chronos can run on a cluster of multiple nodes, and these nodes automatically elect one node as the leader node. Only the leader responds to API requests, and requests to other nodes are automatically redirected to the leader.

Listing jobs: you can obtain a JSON-formatted list of jobs through curl and the response will include invocationCount (number of times job completed), executor (auto-determined by Chronos, but will usually be empty for non-async jobs), and parents (for dependent jobs, a list of jobs that must run before this job). If there is a parents field there will be no schedule field and vice-versa:

curl -L -X GET chronos-node:8080/scheduler/jobs

Deleting jobs: to delete job my_job use this request:

curl -L -X DELETE chronos-node:8080/scheduler/job/my_job

Deleting tasks: Deleting tasks for a job is useful if a job gets stuck. The job name corresponds to the information returned from the job listing request:

curl -L -X DELETE chronos-node:8080/scheduler/task/kill/my_job

Manual job start: You can manually start a job by issuing an HTTP request:

curl -L -X PUT chronos-node:8080/scheduler/job/my_job

Adding jobs: send a JSON hash with the fields Name, Command, and Schedule (in ISO8601 format). We will explain the details for the json hash next:

curl -L -H 'Content-Type: application/json' -X POST -d '{<json hash>}' chronos-node:8080/scheduler/iso8601

JSON hash: an example of a JSON hash is shown below. We discuss each component below:

{
  "schedule": "R10/2012-10-01T05:52:00Z/PT2S",
  "name": "SAMPLE_JOB1",
  "epsilon": "PT15M",
  "command": "echo 'FOO' >> /tmp/JOB1_OUT",
  "owner": "bob@airbnb.com",
  "async": false
}

Job schedule: The schedule consists of 3 parts separated by ‘/’:

number of times to repeat the job or ‘R’ to repeat forever
start time of the job, an empty start time means start immediately, such as “1997-07-16T19:20:30.45+01:00”
run interval, such as P1Y2M3DT4H5M6S, see examples below.

The run interval: the following examples illustrate how to specify run intervals:

P10M: 10 months
PT10M: 10 minutes
P1Y12M12D: 1 years plus 12 months plus 12 days
P12DT12M: 12 days plus 12 minutes
P1Y2M3DT4H5M6S: Period: 1 Year, 2 Months, 3 Days, Time: 4 Hours, 5 Minutes, 6 Seconds

P is required. T is for distinguishing minute and month, when Hour, Minute, Second exists.

Available time zones: The time zone name to use when scheduling the job:

this field takes precedence over any time zone specified in Schedule
to see supported time zones, use java.util.Timezone#getAvailableIDs(), see List of tz database time zones

Example time zone: for example, to specify Pacific Standard Time use:

json { "schedule": "R/2014-10-10T18:32:00Z/PT60M", "scheduleTimeZone": "PST" }

Retry epsilon: If Chronos misses a scheduled run time for any reason, it will run the job later as long as the current time is within the specified epsilon interval. Epsilon must be formatted like an ISO 8601 Duration.

Job owner: the email address of the person responsible for the job.

Async: the async flag specifies whether the job will run in the background or in blocking mode in the foreground.

Add job example: with the hash constructed as described above, send the job schedule request to Chronos:

curl -L -H 'Content-Type: application/json' -X POST -d '{ "schedule": "R10/2012-10-01T05:52:00Z/PT2S",  "name": "SAMPLE_JOB1",  "epsilon": "PT15M",  "command": "echo 'FOO' >> /tmp/JOB1_OUT",  "owner": "bob@airbnb.com",  "async": false}' chronos-node:8080/scheduler/iso8601

Adding dependent jobs: dependent job takes the same JSON format as a scheduled job. However, instead of the schedule field, it will accept a parents field. The parents field lists other jobs which must run at least once before this job will run.

curl -L -X POST -H 'Content-Type: application/json' -d '{dependent hash}' chronos-node:8080/scheduler/dependency

Example dependency job hash: Here is a more elaborate example for a dependency job hash:

{
    "async": true,
    "command": "bash -x /srv/data-infra/jobs/hive_query.bash run_hive hostings-earnings-summary",
    "epsilon": "PT30M",
    "errorCount": 0,
    "lastError": "",
    "lastSuccess": "2013-03-15T13:02:14.243Z",
    "name": "hostings_earnings_summary",
    "owner": "bob@airbnb.com",
    "parents": [
        "db_export-airbed_hostings",
        "db_export-airbed_reservation2s"
    ],
    "retries": 2,
    "successCount": 100
}

Adding docker jobs: docker jobs take the same format as a scheduled job or a dependency job, with an additional container argument. The container argument requires a type, an image, and optionally takes a network mode and volumes:

curl -L -H 'Content-Type: application/json' -X POST -d '{<json hash>}' chronos-node:8080/scheduler/iso8601

The <json hash> has the following format:

{
 "schedule": "R\/2014-09-25T17:22:00Z\/PT2M",
 "name": "my_docker_job",
 "container": {
  "type": "DOCKER",
  "image": "libmesos/ubuntu",
  "network": "BRIDGE"
 },
 "cpus": "0.5",
 "mem": "512",
 "uris": [],
 "command": "while sleep 10; do date =u %T; done"
}

Dependency graph: Chronos has an endpoint for requesting the dependency graph in form of a dotfile:

curl -L -X GET chronos-node:8080/scheduler/graph/dot

Asynchronous jobs: long-running, synchronous jobs can tie up resources excessively long. To schedule jobs as asynchronous, set “async“: true and ensure your job reports its completion status to Chronos. If your job does not report completion status Chronos report your job as running irrespective of whether it completed or not.

Reporting completion: Reporting job completion to Chronos is accomplished via this API call:

curl -L -X PUT -H "Content-Type: application/json" -d '{"statusCode":0}' chronos-node:8080/scheduler/task/my_job_run_555_882083xkj302

The task id is auto-generated by Chronos. It will be available in your job’s environment as $mesos_task_id. You need to url-encode the mesos task id to ensure it is not corrupted in the process of sending and processing your request.

Remote executables: There are two forms of specifying commands, as the bash script url-runner.bash and as a URL. To use the bash script you need to deploy it to all slaves. To use the URL you need to compile mesos with the cURL libraries.

Job configuration: The following tables provides an overview of job configurations:

Field	Description	Default
name	Name of job.	–
command	Command to execute.	–
arguments	Arguments to pass to the command. Ignored if`shell` is true	–
shell	If true, Mesos will execute `command` by running`/bin/sh -c <command>` and ignore `arguments`. If false, `command` will be treated as the filename of an executable and `arguments` will be the arguments passed. If this is a Docker job and`shell` is true, the entrypoint of the container will be overridden with `/bin/sh -c`	true
epsilon	If, for any reason, a job can’t be started at the scheduled time, this is the window in which Chronos will attempt to run the job again	`PT60S` or `--task_epsilon`.
executor	Mesos executor. By default Chronos uses the Mesos command executor.	–
executorFlags	Flags to pass to Mesos executor.	–
retries	Number of retries to attempt if a command returns a non-zero status	`2`
owner	Email addresses to send job failure notifications. Use comma-separated list for multiple addresses.	–
async	Execute using Async executor.	`false`
successCount	Number of successes since the job was last modified.	–
errorCount	Number of errors since the job was last modified.	–
lastSuccess	Date of last successful attempt.	–
lastError	Date of last failed attempt.	–
cpus	Amount of Mesos CPUs for this job.	`0.1` or `--mesos_task_cpu`
mem	Amount of Mesos Memory in MB for this job.	`128` or `--mesos_task_mem`
disk	Amount of Mesos disk in MB for this job.	`256` or `--mesos_task_disk`
disabled	If set to true, this job will not be run.	`false`
uris	An array of URIs which Mesos will download when the task is started.	–
schedule	ISO8601 repeating schedule for this job. If specified, `parents` must not be specified.	–
scheduleTimeZone	The time zone for the given schedule.	–
parents	An array of parent jobs for a dependent job. If specified, `schedule` must not be specified.	–
runAsUser	Mesos will run the job as this user, if specified.	`--user`
container	This contains the subfields for the container, type (req), image (req), network (optional) and volumes (optional).	–
environmentVariables	An array of environment variables passed to the Mesos executor. For Docker containers, these are also passed to Docker using the -e flag.	–

Sample job: here is a complete sample job configuration:

{
   "name":"camus_kafka2hdfs",
   "command":"/srv/data-infra/kafka/camus/kafka_hdfs_job.bash",
   "arguments": [
      "-verbose",
      "-debug"
   ]
   "shell":"false",
   "epsilon":"PT30M",
   "executor":"",
   "executorFlags":"",
   "retries":2,
   "owner":"bofh@your-company.com",
   "async":false,
   "successCount":190,
   "errorCount":3,
   "lastSuccess":"2014-03-08T16:57:17.507Z",
   "lastError":"2014-03-01T00:10:15.957Z",
   "cpus":1.0,
   "disk":10240,
   "mem":1024,
   "disabled":false,
   "uris":[
   ],
   "schedule":"R/2014-03-08T20:00:00.000Z/PT2H",
   "environmentVariables": [
     {"name": "FOO", "value": "BAR"}
   ]
}

Job Management: for large installations it is impractical to manage jobs via the web UI. Instead, you can manage your job configurations in a git repository, make edits, and use it to configure Chronos. You can use a script called chronos-sync.rb. You can also use a Chronos job to periodically check out your configuration and run chronos-sync.rb.

Synchronizing jobs: there are 2 steps to loading your configuration. First, initialize configuration data:

$ bin/chronos-sync.rb -u http://chronos/ -p /path/to/jobs/config -c

Then, synchronize jobs:

$ bin/chronos-sync.rb -u http://chronos/ -p /path/to/jobs/config

You can also force updating the configuration from disk by passing the -f or --force parameter.the Here, configuration data is placed in /path/to/jobs/config. Running chronos-sync.rb will not delete jobs.

For more details, see the Airbnb Chronos Github page.

1 thought on “The Chronos API”

Anonymous February 3, 2016 at 10:32 pm

I’m curious about when and why one would choose the “async” flag. How does having a blocking task in the foreground use up more resources than an async task in the background? Can you elaborate a bit more on that?

LikeLike

Reply ↓

1 Bit Entropy

Techie Stuff for Data People

The Chronos API

1 thought on “The Chronos API”

Leave a comment Cancel reply

Share this:

Related

1 thought on “The Chronos API”

Leave a comment Cancel reply