Once upon a time, I developed and deployed .NET applications for a large enterprise organization. Such large organizations usually have maintenance windows where systems can actually go down for, well, maintenance. However, I was not fortunate enough to work on (or rather, fortunate enough to not work on) those systems where a maintenance window was an acceptable concept. The area I was responsible for required my applications to be up all the time so the company can sell to our customers any time, any day. Zero downtime during deployment was absolutely required. On .NET (actually, ASP.NET), this is easy. Then, with the progress of time, the organization moved to Java (I had nothing to do with the decision, I promise), and zero downtime deployment became harder.
Zero downtime deployment in Node.js is, in fact, easy, but how to achieve it is not well documented. Specifically, it is easy to deploy Node.js web applications with zero downtime. So let’s get to it.
To achieve zero downtime deployment, we need three capabilities:
1. The application must be able to run in multiple processes.
2. The application must be able to gracefully exit by draining HTTP requests. That is, it must be able to stop accepting new requests, and finish processing old requests before quitting.
3. The application must be able to unload its old code, and load the newly deployed code, and then start serving requests.
The approach to zero downtime deployment is to:
1. Deploy new package (code) to target folders on the server
2. Stop each process of the application running on the old code in turn, and start new processes running on new code.
In-depth explanation
Node.js applications run in a single thread, and modules are cached. Any new code must be loaded again into the cache. The new code may deal with resources (files, sockets, etc) differently from the old, so resources need to be cleaned up too. The only reliable way to do this as of Node.js 0.10 is to stop and start the application. There are ways to reload Node.js modules (http://nodejs.org/docs/latest/api/globals.html#globals_require_cache) and there are even some npm modules that would do that, but these will not clean up resources. But wait, stopping and starting the application would mean downtime! Not if the same application is running on multiple processes, and we only stop a process of the application at a time and let the other remaining processes continue serving requests.
The easiest way to run an application in multiple processes is to use the pm2 module. I’m going to introduce another, more focused way using the Node.js cluster module. The application will use cluster
to fork workers to service requests and the master process will manage the workers. From here on, I will use an example to demonstrate how this is done. The example uses Sails.js to serve HTTP requests.
The application start up will look like this.
require('./clusterize')({ // clusterize will be explained later // forker - the function to run when a worker process is spun up forker: function () { // Start sails and pass it command line arguments require('sails').lift(require('optimist').argv); }, // killer - the function to tell a worker to quit gracefully killer: function (worker) { // properly kill worker (sails.js) using SIGINT. A mere w.disconnect() is insufficient for Sails+Express cleanup worker.kill('SIGINT'); }, options: { restartDelay: 2000 } });
clusterize
forks as many processes as there are CPU cores and calls forker
. The master process keeps tracks of all the worker processes so that it can send signals to the workers when the application needs to shut down or restart. The master process is the key to zero downtime deployment: as long as the master process always keeps some processes running as it recycles some processes to reload the newly deployed code, we have zero downtime deployment.
The first thing clusterize
does is to spin up worker processes. Workers can be started all at once and in a sequence. The real work is done by cluster.fork()
. When a worker process starts, its execution starts at the application’s entry point (which, for most applications, would be app.js
).
function startWorkers(options) { if (!cluster.isMaster) return; options = options || { poolSize: numCPUs }; var poolSize = options.poolSize || numCPUs; // Fork workers. if (options.startDelay !== undefined) { // start workers with a delay in between startWorkerLoop(options, poolSize); } else { for (var i = 0; i < poolSize; i++) { startWorker(options); } } } function startWorkerLoop(options, countDown) { if (countDown <= 0) return; startWorker(options); setTimeout(startWorkerLoop, options.startDelay, options, countDown-1); } function startWorker(options) { cluster.fork(); }
The master process would listen for the SIGUSR2
signal. Send SIGUSR2
to recycle the application, master process and worker processes.
process.on('SIGUSR2', function () { if (cluster.isMaster) { restartWorkers(); } });
The functions that do the actual work of recycling the processes:
function restartWorkers() { // remember the current workers var nowWorkers = []; _.forOwn(cluster.workers, function (v, id) { nowWorkers.push(id); }); if (options.restartDelay !== undefined) { restartWorkerLoop(nowWorkers); } else { while (nowWorkers.length > 0) { var w = cluster.workers[nowWorkers[0]]; if (w !== undefined) { stopWorker(w, 'r'); } nowWorkers.shift(); } } } function restartWorkerLoop(nowWorkers) { if (nowWorkers.length <= 0) return; var w = cluster.workers[nowWorkers[0]]; if (w !== undefined) { stopWorker(w, 'r'); } nowWorkers.shift(); setTimeout(restartWorkerLoop, options.restartDelay, nowWorkers); } function stopWorker(w, flag) { managedPids[w.process.pid] = flag; // master-triggered kill or master-triggered respawn // set up timeout before disconnecting from worker var timeout = setTimeout(function () { w.kill(); }, 60000); // wait a minute before force kill - this would mean that new code would be loaded BEFORE processing is completed by the worker w.on('disconnect', function () { clearTimeout(timeout); }); // properly kill worker (sails.js) using SIGINT. A mere w.disconnect() is insufficient for Sails+Express cleanup killer(w); }
listen
to the same port without triggering an EADDRINUSE
error if the cluster
module is used. Quoting Node.js documentation:
When you call server.listen(…) in a worker, it serializes the arguments and passes the request to the master process. If the master process already has a listening server matching the worker’s requirements, then it passes the handle to the worker. If it does not already have a listening server matching that requirement, then it will create one, and pass the handle to the worker.
Tying everything together is the master process itself, whose responsibility is to track all the process IDs of its workers, be informed of when the workers start and finish, and respond to restart signal (SIGUSR2
). The full clusterize.js
along with the example smallappcluster
is available on Github. The application serves the URI /chat
over multiple worker processes. It spins up one worker process per CPU core, so yea, you need to run it on a multi-core CPU to get the full effect.
At scale
So far, our zero downtime deployment approach applies to a single server. When the application is running on not just multiple cores, but on multiple servers, the approach then needs to changed. When running on multiple servers, there will be likely load balancers, or at the very least, proxies like nginx instances, to route requests to the different servers. Some servers can be left running to service requests while some are taken offline for deployment. Cluster
module’s magical port sharing will not work across processes running on different servers and/or running behind different load balancers/proxies.