Category Archives: Ops

Folder-Level Access to S3 with AWS AIM

Program-GroupI recently had a need to grant users access toS3 but to limit access to specific folders within an S3 bucket.  While this requirement is not supported out-of-the-box, as with a traditional filesystem, you can set up IAM policies to achieve the same effect.

Background: The background for my folder-level access requirement is that we are building a system the processes data, and users should be able to write new data into S3 that the system will then process.  However, I don’t want users to access system data, and they should instead only be able to drop input data into a specific input folder.

S3 Concepts: As mentioned above, S3 is not a file system, and there are no paths, such as “/home/bob/my-file.text”.  In S3, you can create such a file but the “path” will be the file name and the slashes “/” in the file name won’t have special meaning.  Therefore, you must set up IAM policies where you define the slash “/” to be a delimiter.

s525151301783206906_p10_i1_w425S3 Buckets: You may have heard that S3 organizes data into “buckets” and you could just give different users access to different buckets.  While this is true, and easier to set up, this approach won’t scale because S3 only allows up to 100 buckets in each AWS account.

IAM Policies: In order to implement the folder-level access permissions, you will need to create policies for listing buckets, getting bucket locations, listing a specific bucket, and allowing all S3 actions in a specific folder.  With these policies you will be able to allow required Amazon S3 console permissions, allow required Amazon S3 console permissions, allow listing objects in the user’s folder, and allow all Amazon S3 actions in David’s folder.

Policy Variables: By setting up fixed IAM policies, you can get specific users set up easily.  However, if you have many users you won’t want to create the required set of policies for each user individually.  Instead you will want to use “policy variables“.  That is, instead of referring to a specific user such as “David”, you will be referring to the “username” variable: ${aws:username}.

If you have more questions about folder-level access to S3 with AWS AIM be sure to visit Jim Scharf‘s post on “Writing IAM policies: Grant access to user-specific folders in an Amazon S3 bucket“.


How Red Hat describes OpenShift.

Recently, several of my friends recommended I look into OpenShift.  Here is how Red Hat describes OpenShift.

What is OpenShift: Red Hat, know by developers for its linuxstorage and cloud offerings and support for JAVAPHPPYTHON, and RUBY positions OpenShift as a Platform as a Service (PAAS) offering.  That is, OpenShift is a platform in the cloud where application developers can build, test, deploy, and run their applications.  To do so, OpenShift provides infrastructure, middleware, and management tools.

Usage: using OpenShift involves these steps. 1. Create an “Application” in OpenShift.  This can be done with the command-line or via an IDE.  2. Code the application with your favorite text editor or IDE.  Finally, 3. push the application code to OpenShift, with the command-line or from your IDE.

Supported languages: OpenShift supports Node.js, Ruby, Python, PHP, Perl, and Java. You can also any language with a “cartridge functionality” feature, and integrations have been developed for languages such as Clojure and Cobol.  Supported frameworks include Spring, Rails, and Play.

Elastic scaling: OpenShift provides automatic and manual scaling and clustering.

Selling points: Red Hat stresses leadership, stability, responsiveness, performance, security, and survivability.  Specifically, Red Hat emphasizes multi-tenancy, fine-grained security, and control over compute and storage resources. If desired, SELinux allows OpenShift to “firewall” one user’s application from another.  Red Hat believes that their “multi-tenant in the OS” approach vs. a “multi-tenant hypervisor” approach can scale resources more quickly

Sample code:

For more details, see Red Hat’s OpenShift page.

The Chronos API

I recently blogged about Airbnb’s Chronos job scheduler.  Here I take a look at the Chronos API.

Launch command: Your system must run Mesos and Zookeeper.  Then, you launch chronos via java:

java -cp chronos.jar --master zk:// --zk_hosts

API Access: Chronos provides a RESTful JSON API over HTTP and listens on port 8080 for requests. For example, your Chronos leader may run at a URL such as

Leader node: Chronos can run on a cluster of multiple nodes, and these nodes automatically elect one node as the leader node. Only the leader responds to API requests, and requests to other nodes  are automatically redirected to the leader.

Listing jobs: you can obtain a JSON-formatted list of jobs through curl and the response will include invocationCount (number of times job completed), executor (auto-determined by Chronos, but will usually be empty for non-async jobs), and parents (for dependent jobs, a list of jobs that must run before this job).  If there is a parents field there will be no schedule field and vice-versa:

curl -L -X GET chronos-node:8080/scheduler/jobs

Deleting jobs: to delete job my_job use this request:

curl -L -X DELETE chronos-node:8080/scheduler/job/my_job

Deleting tasks: Deleting tasks for a job is useful if a job gets stuck. The job name corresponds to the information returned from the job listing request:

curl -L -X DELETE chronos-node:8080/scheduler/task/kill/my_job

Manual job start: You can manually start a job by issuing an HTTP request:

curl -L -X PUT chronos-node:8080/scheduler/job/my_job

Adding jobs: send a JSON hash with the fields Name, Command, and Schedule (in ISO8601 format).  We will explain the details for the json hash next:

curl -L -H 'Content-Type: application/json' -X POST -d '{<json hash>}' chronos-node:8080/scheduler/iso8601

JSON hash: an example of a JSON hash is shown below.  We discuss each component below:

  "schedule": "R10/2012-10-01T05:52:00Z/PT2S",
  "name": "SAMPLE_JOB1",
  "epsilon": "PT15M",
  "command": "echo 'FOO' >> /tmp/JOB1_OUT",
  "owner": "",
  "async": false

Job schedule: The schedule consists of 3 parts separated by ‘/’:

  1. number of times to repeat the job or ‘R’ to repeat forever
  2. start time of the job, an empty start time means start immediately, such as “1997-07-16T19:20:30.45+01:00”
  3. run interval, such as P1Y2M3DT4H5M6S, see examples below.

The run interval: the following examples illustrate how to specify run intervals:

  • P10M: 10 months
  • PT10M: 10 minutes
  • P1Y12M12D: 1 years plus 12 months plus 12 days
  • P12DT12M: 12 days plus 12 minutes
  • P1Y2M3DT4H5M6S: Period: 1 Year, 2 Months, 3 Days, Time: 4 Hours, 5 Minutes, 6 Seconds

P is required. T is for distinguishing minute and month, when Hour, Minute, Second exists.

Available time zones: The time zone name to use when scheduling the job:

Example time zone: for example, to specify Pacific Standard Time use:

json { "schedule": "R/2014-10-10T18:32:00Z/PT60M", "scheduleTimeZone": "PST" } 

Retry epsilon: If Chronos misses a scheduled run time for any reason, it will run the job later as long as the current time is within the specified epsilon interval. Epsilon must be formatted like an ISO 8601 Duration.

Job owner: the email address of the person responsible for the job.

Async: the async flag specifies whether the job will run in the background or in blocking mode in the foreground.

Add job example: with the hash constructed as described above, send the job schedule request to Chronos:

curl -L -H 'Content-Type: application/json' -X POST -d '{ "schedule": "R10/2012-10-01T05:52:00Z/PT2S",  "name": "SAMPLE_JOB1",  "epsilon": "PT15M",  "command": "echo 'FOO' >> /tmp/JOB1_OUT",  "owner": "",  "async": false}' chronos-node:8080/scheduler/iso8601

Adding dependent jobs: dependent job takes the same JSON format as a scheduled job. However, instead of the schedule field, it will accept a parents field. The parents field lists other jobs which must run at least once before this job will run.

curl -L -X POST -H 'Content-Type: application/json' -d '{dependent hash}' chronos-node:8080/scheduler/dependency

Example dependency job hash: Here is a more elaborate example for a dependency job hash:

    "async": true,
    "command": "bash -x /srv/data-infra/jobs/hive_query.bash run_hive hostings-earnings-summary",
    "epsilon": "PT30M",
    "errorCount": 0,
    "lastError": "",
    "lastSuccess": "2013-03-15T13:02:14.243Z",
    "name": "hostings_earnings_summary",
    "owner": "",
    "parents": [
    "retries": 2,
    "successCount": 100

Adding docker jobs: docker jobs take the same format as a scheduled job or a dependency job, with an additional container argument.  The container argument requires a type, an image, and optionally takes a network mode and volumes:

curl -L -H 'Content-Type: application/json' -X POST -d '{<json hash>}' chronos-node:8080/scheduler/iso8601

The <json hash> has the following format:

 "schedule": "R\/2014-09-25T17:22:00Z\/PT2M",
 "name": "my_docker_job",
 "container": {
  "type": "DOCKER",
  "image": "libmesos/ubuntu",
  "network": "BRIDGE"
 "cpus": "0.5",
 "mem": "512",
 "uris": [],
 "command": "while sleep 10; do date =u %T; done"

Dependency graph: Chronos has an endpoint for requesting the dependency graph in form of a dotfile:

curl -L -X GET chronos-node:8080/scheduler/graph/dot

Asynchronous jobs: long-running, synchronous jobs can tie up resources excessively long.  To schedule jobs as asynchronous, set async: true and ensure your job reports its completion status to Chronos.  If your job does not report completion status Chronos report your job as running irrespective of whether it completed or not.

Reporting completion: Reporting job completion to Chronos is accomplished via this API call:

curl -L -X PUT -H "Content-Type: application/json" -d '{"statusCode":0}' chronos-node:8080/scheduler/task/my_job_run_555_882083xkj302

The task id is auto-generated by Chronos. It will be available in your job’s environment as $mesos_task_id.  You need to url-encode the mesos task id to ensure it is not corrupted in the process of sending and processing your request.

Remote executables: There are two forms of specifying commands, as the bash script url-runner.bash and as a URL.  To use the bash script you need to deploy it to all slaves.  To use the URL you need to compile mesos  with the cURL libraries.

Job configuration: The following tables provides an overview of job configurations:

Field Description Default
name Name of job.
command Command to execute.
arguments Arguments to pass to the command. Ignored ifshell is true
shell If true, Mesos will execute command by running/bin/sh -c <command> and ignore arguments. If false, command will be treated as the filename of an executable and arguments will be the arguments passed. If this is a Docker job andshell is true, the entrypoint of the container will be overridden with /bin/sh -c true
epsilon If, for any reason, a job can’t be started at the scheduled time, this is the window in which Chronos will attempt to run the job again PT60S or --task_epsilon.
executor Mesos executor. By default Chronos uses the Mesos command executor.
executorFlags Flags to pass to Mesos executor.
retries Number of retries to attempt if a command returns a non-zero status 2
owner Email addresses to send job failure notifications. Use comma-separated list for multiple addresses.
async Execute using Async executor. false
successCount Number of successes since the job was last modified.
errorCount Number of errors since the job was last modified.
lastSuccess Date of last successful attempt.
lastError Date of last failed attempt.
cpus Amount of Mesos CPUs for this job. 0.1 or --mesos_task_cpu
mem Amount of Mesos Memory in MB for this job. 128 or --mesos_task_mem
disk Amount of Mesos disk in MB for this job. 256 or --mesos_task_disk
disabled If set to true, this job will not be run. false
uris An array of URIs which Mesos will download when the task is started.
schedule ISO8601 repeating schedule for this job. If specified, parents must not be specified.
scheduleTimeZone The time zone for the given schedule.
parents An array of parent jobs for a dependent job. If specified, schedule must not be specified.
runAsUser Mesos will run the job as this user, if specified. --user
container This contains the subfields for the container, type (req), image (req), network (optional) and volumes (optional).
environmentVariables An array of environment variables passed to the Mesos executor. For Docker containers, these are also passed to Docker using the -e flag.

Sample job: here is a complete sample job configuration:

   "arguments": [
   "environmentVariables": [
     {"name": "FOO", "value": "BAR"}

Job Management: for large installations it is impractical to manage jobs via the web UI. Instead, you can manage your job configurations in a git repository, make edits, and use it to configure Chronos.  You can use a script called chronos-sync.rb. You can also use a Chronos job to periodically check out your configuration and run chronos-sync.rb. 

Synchronizing jobs: there are 2 steps to loading your configuration.  First, initialize configuration data:

$ bin/chronos-sync.rb -u http://chronos/ -p /path/to/jobs/config -c

Then, synchronize jobs:

$ bin/chronos-sync.rb -u http://chronos/ -p /path/to/jobs/config

You can also force updating the configuration from disk by passing the -f or --force parameter.the Here, configuration data is placed in /path/to/jobs/config. Running chronos-sync.rb will not delete jobs.

For more details, see the Airbnb Chronos Github page.

The airbnb/chronos scheduler

Chronos at Airbnb: At Airbnb, chronos functions in an environment that includes AWS EMR, MySQL, Amazon Redshift, S3, Cascading, Cascalog, Hive, Pig.  Challenges in this environment include variance in network latency, unpredictable I/O performance, and spurious web services timeouts.  These challenges prompted Airbnb to look for a lightweight scheduling solution that allowed retries and provides high availability and via easy-to-use GUI interface.  In addition, Airbnb wanted the ability to schedule non-Hadoop jobs, such as bash scripts, and distribute work across multiple systems.  Thus Airbnb decided to build Chronos and to leverage Mesos, which provides the required primitives for storing state, distributing work, and adding new workers on the fly.

Chronos UI: The Chronos UI supports adding, deleting, listing, modifying and running jobs. It can show graphs of job dependencies.


What people asking: Alerting and notification are not well documented but you can specify email addresses to send job failure notifications. Use comma-separated list for multiple addresses.  It is not obvious whether you can integrate chronos with other business applications, for example, Zenoss, JIRA, Logstash, etc.  Is it adaptable to multi-timezone calendars?  Does it have the ability to create incident reports?  What is the roundtrip time between submitting request and receiving a response?  Does it have reporting features? Does it have the ability to failover and load balance? Are all features available via both command line and GUI?

Mesos as an OS for the Data Center

 argues that the data center needs an operating system, and I agree.  Benjamin Hindman is one of the creators of Apache Mesos and the chief architect at Mesosphere and I was curious about his thoughts.

Great DaneLarge-scale, distributed systems: Benjamin’s starting point is that modern applications no longer fit on a single server.  Instead, large-scale, distributed systems run on frameworks, such as Apache Hadoop and Apache Spark, message brokers like Apache Kafka, key-value stores like Apache Cassandra.

The right unit of abstraction: As a consequence, applications, not servers, should be the unit of abstraction in the data center.  When developers must deal with machines as the available level of abstraction, they need to deal with IP addresses and local storage, which makes moving and resizing applications difficult.

Labor-intensive and inefficient: Operators who deploy applications must anticipate machine loss and often harness complexity by deploying one application per machine, which is clearly inefficient.  This problem will become more pronounced as companies replace monolithic architectures with service-oriented architectures and build more software based on micro-services.  As a result, data center run at only 8-15% efficiency, and running applications is too labor intensive.

funny-big-dogData center operating system: applications should run on any available resources from any machine, even if there are other applications already running on those machines.  The data center operating system should allocate applications to machines, providing resource management and process isolation.  An API for the data center would allocate and de-allocate resources, launch, monitor, and destroy processes, and support service discovery and coordination.

Service discovery and coordination: Most distributed applications achieve high availability and fault tolerance through some means of coordination, such as consensus, which is notoriously hard to implement correctly and efficiently.  Existing tools for service discovery and coordination include Apache ZooKeeper andCoreOS’ etcd. It would be attractive to centrally offer these services as part of a data center operating system instead and to allow developers to launch applications via a CLI or GUI, and the application executes using the data center operating system’s API.

Apache Mesos, the distributed systems kernel: The open source Apache Mesos project, of which Benjamin Hindman is a co-creators and the project chair, is a step in that direction. Apache Mesos aims to be a distributed systems kernel that provides a portable API upon which distributed applications can be built and run.  Distributed systems that can leverage Mesos include Apache Spark, Apache Aurora, Airbnb’s Chronos, Mesosphere’s MarathonApache Hadoop, Apache Storm, and Google’s Kubernetes.

For these reasons, Apache Mesos is popular in the industry. For example, Chronos, a distributed system that provides highly available and fault-tolerant cron was built on top of Mesos in a few thousand lines of code, without explicit socket programming for network communication.  Twitter and Airbnb use Mesos to run their data centers, and many others are also leveraging Mesos.

For more details, see Benjamin Hindman’s post.