At Opendoor, we've run our apps on Heroku's platform since our first deploy. We're big believers in the platform's value proposition: get up and running quickly and avoid spending time dealing with infrastructure.
Our data science stack
One of our services is a Python data science app which estimates the value of any home by analyzing data sources like historical home sales and public records. We regularly run batch analysis jobs to train our machine learning models, as well as experiments to improve the accuracy of those models. The app is built on the scientific Python stack using conda as the package manager. Alongside the familiar numpy, scipy and pandas packages, it also depends on C-based machine learning and geospatial analysis libraries.
A common problem in data science occurs when your datasets are too large to fit into a single machine's memory. The programs for analyzing this data can become complicated if they need to deal with moving data in and out of memory, or splitting the work up onto many machines at once. This makes the software either slower to run or slower to build.
Fortunately, housing data falls under the "small data" regime, so we can avoid the complexity that arises when a dataset can't fit into RAM. A single metro area has tens of thousands of home sales per year, and the distilled dataset is typically less than 1GB even when looking at decades of historical data. This is a tiny slice of the available RAM on a modern server or EC2 instance, even after the data is loaded into Python.
Getting the Python data science stack running on Heroku
When it was time to deploy our data science app on Heroku, we discovered that most of its dependencies were not installed by default on Heroku's Cedar stack. Heroku's solution for adding custom dependencies is buildpacks. There are many default buildpacks for common programming languages, as well as a multi-buildpack that lets you compose multiple buildpacks together.
We ended up with four separate buildpacks. Some of these were tweaked versions of an existing buildpack, like the conda buildpack, while others needed to be written from scratch. Because the process of iterating on these buildpacks is slow, we spent many hours getting them to do what we needed and making them work well together. Ultimately we were able to get the app running on Heroku, and for a while, it worked well.
Limitations of Heroku
As our machine learning model evolved and became more complicated, the memory usage of our app grew as well. We were running on Heroku's 6GB PX dynos from the beginning, but we soon realized that these were far too small for our needs. Months ago, we began seeing our dynos killed on a regular basis with R15 errors, indicating that they were using all 6GB of allocated memory as well as 6GB of swap. Our memory usage charts confirmed this issue:
Unfortunately, a 6GB PX dyno was the largest available from Heroku. We began keeping an eye out for better options, though we assumed that in the near future we would need to spend the time to move the app to raw EC2.
While discussing our options, we learned about Convox, a new software infrastructure platform developed by three former Heroku engineers.
Opendoor's engineers have experience building PaaS systems at Apple and Heroku, and we're well aware of how difficult these systems are to build and make reliable. We're skeptical of any new product in this area because it can take years to get right.
What interested us about Convox is that they avoid solving many of the harder problems inherent in a PaaS. Rather than building complicated pieces of software to handle containerization and scheduling, Convox hands off those responsibilities to battle-tested primitives like EC2, VPC, EC2 Container Service (ECS), and Docker. They're just providing the glue that makes the user experience feel much more like Heroku.
Even if our Convox API goes away altogether, ECS will continue running our containers.
Convox managed installation
We are still big believers in managed infrastructure, and we want to avoid running non-business-specific software ourselves. One of the things we were concerned about with moving off Heroku was the fact that we'd have to manage and monitor our own PaaS installation, or at least our own AWS setup. To keep our experience more like what we're accustomed to with Heroku, the Convox team offered to run a managed installation for us. This is something they're evaluating offering as a service.
Getting started on Convox
Convox uses Docker under the hood, and can run any 12-factor app with a Dockerfile. To move from Heroku, we needed to write a Dockerfile to do all of the steps previously handled by our buildpacks. While it took many hours to create Heroku buildpacks for our app, it took less than two hours write an equivalent Dockerfile. That's because we were able to start with an Ubuntu base image and use
apt-get and standard Linux installation scripts rather than having to conform to the unique interface of Heroku buildpacks.
Here is a simplified version of our Dockerfile that bootstraps a scientific Python environment:
FROM ubuntu:14.04 # Install apt dependencies RUN apt-get update -qq && apt-get install -y \ build-essential \ curl \ git # Set up Miniconda environment for python2 ENV PATH /opt/miniconda/bin:$PATH RUN curl https://repo.continuum.io/miniconda/Miniconda-3.5.2-Linux-x86_64.sh -s -o miniconda.sh && \ bash miniconda.sh -p /opt/miniconda -b && \ conda update --yes conda && \ conda install pip --yes RUN mkdir /app WORKDIR /app COPY requirements.txt /app/ COPY conda-requirements.txt /app/ RUN conda install --file conda-requirements.txt --yes && \ pip install -r requirements.txt --exists-action=w --allow-all-external && \ conda clean -pt COPY . /app/
Once we had Dockerized our app, we deployed again and were up and running. Convox let us set up our environment variables like Heroku:
$ convox env AWS_ACCESS_KEY_ID=[redacted] AWS_SECRET_ACCESS_KEY=[redacted] $ convox env set DATABASE_URL=postgres://...
Scaling up, changing instance types
Once we had our app running on Convox, scaling it up was as easy as Heroku:
$ convox scale --count 4 --mem 12288
Convox lets us scale vertically, sizing our containers and instances however we choose. They have a command that changes the underlying cluster to any EC2 instances we desire, letting us use newer and faster hardware than Heroku offers:
$ convox system scale --count 9 --type c4.2xlarge
While testing some of our heavier workloads on Convox, we did run into some unexpected issues. It turns out that our Heroku workers were swapping more than we thought they were. Docker, by default, doesn't allow containers to swap, and ECS currently offers no way to change this setting. Because of some memory issues with our app, the workers would get killed whenever they tried to handle a large job.
While we want to fix these memory issues, they are not as urgent as our other priorities. We're willing to continue letting the app swap for a few weeks while we worked on other things. Although ECS doesn't yet have an option to do this, the Convox team was able to give us a workaround that adjusted the container's swap allocation after creation. To use it, all we have to do is set a single environment variable:
$ convox env set SWAP=1
Convox's weak spots (so far)
We have had some issues with Convox so far, which is not surprising considering that it's still a very young product. Most of our issues have been with slow or unavailable deploys. When we encountered memory issues, they were difficult to debug because Convox lacks even the rudimentary visibility offered by Heroku. We've been told that this is a high priority for the team, so that may change soon.
Convox also does not yet support scaling multiple process types independently. We run a small number of web processes and a large number of workers. We can live with this issue right now because our big pain point is mainly with our workers. In the mean time, we're running a dual deployment setup with web dynos on Heroku and workers on Convox. We're excited to simplify this in the near future.
The Convox team has been quick to respond and address our issues. We're more than willing to work through these growing pains in exchange for the huge gains in flexibility.
We're fortunate that our use case is a good fit for what Convox offers today. Our data science workers are able to tolerate downtime much more than our customer-facing web stack, which lets us take on a little more risk here.
Heroku's new PX dynos
During the course of evaluating Convox, Heroku moved us to a beta of their new 14GB PX dynos. These use newer and faster EC2 instances than the old PX dynos, and they've been a big improvement for us.
However, by that time we had already moved our data science stack over to Convox. We've been so impressed by Convox's flexibility and its substantial cost savings over Heroku that we plan to continue using it for our data science stack.
Meanwhile, we're happy to be using these new PX dynos on our other apps. Our customer-facing Rails app has benefited greatly from the extra memory and decreased latency.
Moving our Python data science stack from Heroku to Convox was a straightforward process. In fact, this process was much easier than getting the app running on Heroku in the first place. With Convox, we're able to access the full power and flexibility of EC2, with prices that are substantially cheaper than Heroku's. We hope this will allow us to spend more time improving our home pricing models, as well as scaling up our infrastructure to handle more cities and larger datasets.
Opendoor is hiring
Opendoor is reinventing life's most important transaction. We're hiring for a variety of engineering, data science, and operations roles. If you're excited about our mission and the kind of work we do, check out our jobs page to learn more.