Node.js Interview Prep
Asynchronous Patterns

Error Handling Strategy

From Local try/catch to Graceful Shutdown

LinkedIn Hook

"Your Node.js process just caught an uncaughtException. What do you do next?"

If your answer is "log it and keep running" — your production server is a ticking time bomb. The official Node.js documentation is unambiguous on this point: you must exit the process. Yet I have reviewed dozens of codebases that quietly swallow fatal errors, leaving behind corrupted state, leaked file descriptors, and half-finished database transactions.

Joyent's classic essay on Node.js error handling drew a line that every backend engineer should memorize: operational errors (network down, file missing, invalid input) are part of normal operation and must be handled. Programmer errors (bugs, undefined references, broken invariants) cannot be handled — they must crash the process so a supervisor can restart it cleanly.

In Lesson 5.3, I walk through the full strategy: custom AppError classes, centralized handlers, Error.cause chaining, async error propagation, and what graceful shutdown actually looks like when the sky is falling.

Read the full lesson -> [link]

#NodeJS #BackendDevelopment #ErrorHandling #SoftwareEngineering #InterviewPrep


Error Handling Strategy thumbnail


What You'll Learn

  • The Joyent distinction between operational errors and programmer errors and why it dictates everything
  • How to design a custom AppError class with statusCode and isOperational flags
  • Why a centralized error handler beats scattered try/catch blocks
  • How process.on('uncaughtException') and process.on('unhandledRejection') should behave (log + exit, never swallow)
  • How to perform a graceful shutdown after a fatal error — close servers, drain connections, then exit
  • How errors propagate through async/await — throw inside async equals a rejected promise
  • How to use Error.cause (Node 16.9+) for clean error chaining without losing the original stack

The Restaurant Kitchen Analogy — Two Kinds of Disaster

Imagine a busy restaurant kitchen during dinner service. Two very different problems can happen.

The first problem: a delivery is late, the freezer is full, a customer changes their order halfway through. These are expected disruptions. The kitchen has playbooks for them. The chef substitutes an ingredient, the waiter apologizes, service continues. Nobody panics because these problems are part of running a restaurant. In Node.js terms, these are operational errors — network timeouts, missing files, validation failures, database deadlocks. You expect them, you handle them, you keep serving customers.

The second problem: the chef discovers that the gas line has been leaking for an hour, or that someone wired the oven to the wrong voltage and it just shorted the entire building. These are programmer errors — broken invariants. You cannot "handle" a gas leak by ignoring it and continuing to cook. You evacuate, shut everything down cleanly, fix the wiring, and reopen. In Node.js terms, these are bugs: undefined references, type errors, unmet assumptions, corrupted state. The only safe response is to stop the process and let a supervisor restart it from a known-good state.

The mistake almost every junior backend engineer makes is trying to treat the second category like the first. They wrap everything in try/catch, log the gas leak, and keep cooking. The result is a process running in an unknown state, leaking resources, returning corrupted responses, and slowly poisoning every customer it touches.

+---------------------------------------------------------------+
|           OPERATIONAL vs PROGRAMMER ERRORS                    |
+---------------------------------------------------------------+
|                                                                |
|  OPERATIONAL ERRORS (Expected, Recoverable):                   |
|  - Failed to connect to database                               |
|  - Request timed out                                           |
|  - Invalid user input                                          |
|  - File not found                                              |
|  - Out of memory on a single request                           |
|                                                                |
|  Strategy: HANDLE                                              |
|  - Retry with backoff                                          |
|  - Return 4xx/5xx to client                                    |
|  - Log and continue serving                                    |
|                                                                |
+---------------------------------------------------------------+
|                                                                |
|  PROGRAMMER ERRORS (Bugs, Unrecoverable):                      |
|  - Cannot read property 'x' of undefined                       |
|  - Called a function with wrong arguments                      |
|  - Forgot to await a promise                                   |
|  - Broke an invariant assertion                                |
|  - Race condition corrupted shared state                       |
|                                                                |
|  Strategy: CRASH                                               |
|  - Log the full error + stack                                  |
|  - Stop accepting new work                                     |
|  - Finish in-flight requests (best effort)                     |
|  - Exit with non-zero code                                     |
|  - Let the supervisor restart                                  |
|                                                                |
+---------------------------------------------------------------+

Napkin AI Visual Prompt: "Dark gradient (#0a1a0a -> #0d2e16). Split panel: LEFT side labeled 'Operational' shows a green (#68a063) shield deflecting arrows labeled '404', 'timeout', 'invalid'. RIGHT side labeled 'Programmer' shows an amber (#ffb020) explosion icon with a process being killed and respawned by a supervisor (PM2/Docker icon). White monospace labels. Center divider line."


The Custom AppError Class — Marking Errors as Operational

The first building block of a serious error strategy is a custom Error subclass. The Node.js standard Error has no notion of "is this safe to recover from" or "what HTTP status code should this map to". You add those fields yourself.

// src/errors/AppError.js
// Custom error class used for ALL operational errors in the app.
// Anything thrown that is NOT an AppError is treated as a programmer error.
class AppError extends Error {
  constructor(message, statusCode = 500, options = {}) {
    // Pass message and { cause } to the base Error (Node 16.9+)
    super(message, { cause: options.cause });

    // Preserve the class name for logging and instanceof checks
    this.name = this.constructor.name;

    // HTTP-style status code so the central handler can map to a response
    this.statusCode = statusCode;

    // The critical flag: did we throw this on purpose?
    // true  -> expected, safe to handle and continue
    // false -> not from us, treat as a bug and crash
    this.isOperational = true;

    // Optional machine-readable code (e.g. 'USER_NOT_FOUND')
    this.code = options.code;

    // Capture a clean stack trace that excludes this constructor frame
    Error.captureStackTrace(this, this.constructor);
  }
}

// Common subclasses keep call sites short and intent obvious
class NotFoundError extends AppError {
  constructor(resource, options = {}) {
    super(`${resource} not found`, 404, { ...options, code: 'NOT_FOUND' });
  }
}

class ValidationError extends AppError {
  constructor(message, options = {}) {
    super(message, 400, { ...options, code: 'VALIDATION_FAILED' });
  }
}

class UnauthorizedError extends AppError {
  constructor(message = 'Unauthorized', options = {}) {
    super(message, 401, { ...options, code: 'UNAUTHORIZED' });
  }
}

module.exports = { AppError, NotFoundError, ValidationError, UnauthorizedError };

Why this matters: when an error reaches the centralized handler, the handler can ask one question — err.isOperational === true? If yes, respond to the client and keep the process alive. If no (or the error is not even an AppError), assume the worst, log it, and prepare for a graceful shutdown.

// src/services/userService.js
const { NotFoundError } = require('../errors/AppError');

async function getUserById(id) {
  // Operational error: the user simply does not exist.
  // This is expected and recoverable — return 404 to the caller.
  const user = await db.users.findById(id);
  if (!user) {
    throw new NotFoundError('User');
  }
  return user;
}

Async Error Propagation — throw inside async = rejected promise

A subtle but critical fact about async functions: any value thrown inside an async function becomes the rejection reason of the promise that function returns. This means try/catch works exactly the way you would expect — but only when you await.

// src/examples/asyncPropagation.js

// Throwing inside an async function does NOT crash the process.
// It returns a rejected promise. The throw is "captured" by async semantics.
async function loadConfig(path) {
  // If readFile rejects, the rejection bubbles up because we await it.
  const raw = await fs.promises.readFile(path, 'utf8');

  // A normal throw becomes a promise rejection at the call site.
  if (!raw.trim()) {
    throw new Error('Config file is empty');
  }

  return JSON.parse(raw);
}

// Caller pattern #1: await inside try/catch (the right way)
async function startup() {
  try {
    const config = await loadConfig('./config.json');
    return config;
  } catch (err) {
    // Both "file not found" and "Config file is empty" land here.
    console.error('startup failed:', err.message);
    throw err;
  }
}

// Caller pattern #2: forgetting to await (the bug factory)
function brokenStartup() {
  try {
    // No await -> the rejection escapes the try/catch entirely.
    // It becomes an unhandledRejection on the next tick.
    loadConfig('./config.json');
  } catch (err) {
    // This catch block will NEVER fire. Ever.
    console.error('this will never run');
  }
}

The rule is mechanical: try/catch only catches errors from code that runs synchronously inside the try block. Once a promise escapes without being awaited, you have lost it. This is the single most common source of unhandledRejection warnings in real Node.js codebases.


Error.cause — Chaining Errors Without Losing Context

Before Node 16.9, the only way to "wrap" a low-level error in a higher-level one was to copy fields manually or stuff the original error into a custom property. The result was almost always a lost stack trace and a broken debugging experience. Error.cause fixes this.

// src/repositories/orderRepository.js
const { AppError } = require('../errors/AppError');

async function getOrder(orderId) {
  try {
    // Low-level call to the database driver
    return await db.query('SELECT * FROM orders WHERE id = $1', [orderId]);
  } catch (dbErr) {
    // Wrap the technical error in a domain error.
    // The original error is preserved via { cause }, so logs show
    // the FULL chain without us having to copy stack traces.
    throw new AppError('Failed to load order', 500, {
      cause: dbErr,
      code: 'ORDER_LOAD_FAILED',
    });
  }
}

When you log the resulting error with console.error(err), Node prints both the wrapper and the cause:

AppError: Failed to load order
    at getOrder (/app/src/repositories/orderRepository.js:12:11)
    ...
    [cause]: Error: connect ECONNREFUSED 127.0.0.1:5432
        at TCPConnectWrap.afterConnect [as oncomplete] (...)

You get the high-level "what failed in business terms" message at the top, and the low-level "what actually went wrong at the system level" cause underneath. No information lost, no fields copied by hand.


The Central Express Error Handler

Scattering try/catch around every route handler is noise. Instead, let errors propagate, and put one handler at the bottom of the middleware stack that decides what to do with everything.

// src/middleware/errorHandler.js
const { AppError } = require('../errors/AppError');
const logger = require('../logger');

// Express recognizes a middleware as an error handler when it has 4 args.
// This must be registered LAST, after all routes and other middleware.
function errorHandler(err, req, res, next) {
  // Step 1: classify the error.
  const isOperational = err instanceof AppError && err.isOperational === true;

  // Step 2: always log. Operational errors at warn, the rest at error.
  if (isOperational) {
    logger.warn({ err, path: req.path }, 'operational error');
  } else {
    // Programmer errors get the full stack and full request context
    logger.error({ err, path: req.path, body: req.body }, 'programmer error');
  }

  // Step 3: respond to the client.
  if (isOperational) {
    return res.status(err.statusCode).json({
      error: { code: err.code, message: err.message },
    });
  }

  // For non-operational errors, never leak internals to the client.
  res.status(500).json({
    error: { code: 'INTERNAL_ERROR', message: 'Something went wrong' },
  });

  // Step 4: if it was a programmer error, the process is in an unknown state.
  // Signal the application to begin a graceful shutdown.
  // (We do NOT exit here directly — see the next section.)
  if (!isOperational) {
    process.emit('fatalError', err);
  }
}

// Async route helper: forwards rejected promises to the error handler.
// Without this, every async route needs its own try/catch + next(err).
const asyncHandler = (fn) => (req, res, next) =>
  Promise.resolve(fn(req, res, next)).catch(next);

module.exports = { errorHandler, asyncHandler };
// src/routes/users.js
const router = require('express').Router();
const { asyncHandler } = require('../middleware/errorHandler');
const { getUserById } = require('../services/userService');

// No try/catch needed — asyncHandler forwards rejections to errorHandler.
router.get('/users/:id', asyncHandler(async (req, res) => {
  const user = await getUserById(req.params.id);
  res.json(user);
}));

uncaughtException, unhandledRejection, and Graceful Shutdown

These two events are your last line of defense. They fire when an error escapes every other catch block in your application. Per the official Node.js documentation, the only correct response is to log the error and terminate the process. Resuming after an uncaughtException is explicitly described as unsafe.

// src/server.js
const http = require('http');
const app = require('./app');
const logger = require('./logger');

const server = http.createServer(app);

// Track in-flight state so we can shut down cleanly.
let isShuttingDown = false;

server.listen(3000, () => logger.info('listening on 3000'));

// ---------------------------------------------------------------
// Graceful shutdown routine
// ---------------------------------------------------------------
async function shutdown(reason, err) {
  // Idempotent: if we are already shutting down, do nothing.
  if (isShuttingDown) return;
  isShuttingDown = true;

  logger.error({ reason, err }, 'beginning graceful shutdown');

  // Force-exit timer: if shutdown hangs, kill the process anyway.
  // 10 seconds is a common SLA for orchestrators (Kubernetes, ECS).
  const forceExit = setTimeout(() => {
    logger.error('forced exit after 10s shutdown timeout');
    process.exit(1);
  }, 10_000);
  // Do not let the timer keep the event loop alive on its own.
  forceExit.unref();

  try {
    // 1. Stop accepting new HTTP connections.
    //    In-flight requests are allowed to finish.
    await new Promise((resolve) => server.close(resolve));

    // 2. Close database pools, message queues, etc.
    await db.close();
    await queue.close();

    logger.info('graceful shutdown complete');
    // Exit with non-zero so the supervisor knows something went wrong.
    process.exit(1);
  } catch (shutdownErr) {
    logger.error({ err: shutdownErr }, 'error during shutdown');
    process.exit(1);
  }
}

// ---------------------------------------------------------------
// Fatal error sources
// ---------------------------------------------------------------

// A bug threw synchronously somewhere with no try/catch.
// The process is in an unknown state — DO NOT continue.
process.on('uncaughtException', (err) => {
  logger.fatal({ err }, 'uncaughtException');
  shutdown('uncaughtException', err);
});

// A promise rejected and nobody attached a .catch / await.
// In modern Node, this also terminates the process by default.
process.on('unhandledRejection', (reason) => {
  logger.fatal({ err: reason }, 'unhandledRejection');
  shutdown('unhandledRejection', reason);
});

// Internal signal from the central error handler when a programmer
// error escapes a request handler.
process.on('fatalError', (err) => shutdown('fatalError', err));

// SIGTERM is what orchestrators send for normal shutdowns.
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));

Why this design works:

  1. Stop accepting new work first. server.close() lets in-flight requests finish but rejects new connections. Failing health checks tell the load balancer to stop sending traffic.
  2. Drain external resources. Close database pools and queues so no half-finished transactions are left dangling.
  3. Bound the wait. A 10-second setTimeout(...).unref() ensures the process cannot hang forever in shutdown.
  4. Exit non-zero. The supervisor (PM2, systemd, Kubernetes, Docker) sees the failure and starts a fresh process with a clean state.
+---------------------------------------------------------------+
|           GRACEFUL SHUTDOWN TIMELINE                          |
+---------------------------------------------------------------+
|                                                                |
|  t=0.0s  Fatal error fires (uncaughtException)                 |
|  t=0.0s  Log full error + stack                                |
|  t=0.0s  isShuttingDown = true                                 |
|  t=0.0s  server.close() -> stop accepting new connections      |
|  t=0.1s  Health check returns 503                              |
|  t=0.5s  Load balancer removes instance from pool              |
|  t=1.0s  In-flight requests finishing...                       |
|  t=2.5s  Last request done, server emits 'close'               |
|  t=2.6s  db.close(), queue.close()                             |
|  t=3.0s  process.exit(1)                                       |
|  t=3.1s  Supervisor starts a fresh process                     |
|                                                                |
|  Hard cap: 10s. After that, force exit no matter what.         |
|                                                                |
+---------------------------------------------------------------+

Common Mistakes

1. Swallowing errors with empty catch blocks. The single most damaging pattern in Node.js codebases is try { ... } catch (e) {} or .catch(() => {}). The error vanishes, the bug stays, and you debug a phantom production incident next week. If you cannot meaningfully recover from an error, do not catch it. Let it propagate to the central handler.

2. Continuing the process after uncaughtException. Some developers register process.on('uncaughtException') purely as a logger and let the process keep running. The Node.js documentation explicitly warns against this: an uncaught exception means application state is unknown — open file descriptors may be leaked, half-written database rows may exist, in-memory caches may be corrupt. Always log and exit.

3. Forgetting to await a promise. A non-awaited promise is invisible to surrounding try/catch. If it rejects, the rejection escapes to unhandledRejection. Even worse, the calling function returns undefined immediately, so the caller proceeds as if everything succeeded. Always await, or explicitly attach .catch() if you really intend to fire-and-forget.

4. Treating every error as a 500. Operational errors carry meaning. A NotFoundError should map to 404, a ValidationError to 400, an UnauthorizedError to 401. Returning 500 for everything destroys observability — you can no longer distinguish "user typed a bad email" from "the database is on fire".

5. Hard-exiting without draining. Calling process.exit(1) immediately on a fatal error kills in-flight requests, drops database connections without ROLLBACK, and abandons queued background work. Always run a graceful shutdown routine first, with a hard timeout as a safety net.


Interview Questions

1. "Explain the difference between operational errors and programmer errors. Why does the distinction matter?"

Operational errors are runtime problems that are part of normal operation: a database connection drops, a file is missing, a user submits invalid input, a remote API times out. They are expected, they have well-defined recovery strategies (retry, return 4xx/5xx, log and continue), and the application can keep running. Programmer errors are bugs: undefined references, type errors, broken invariants, race conditions corrupting shared state. They indicate the application is in a state the developer never anticipated, which means anything could be true now — file descriptors might be leaked, transactions might be half-committed, in-memory state might be corrupt. The Joyent essay that codified this distinction makes the practical point clear: operational errors must be handled, programmer errors must crash the process so a supervisor can restart from a clean slate. Conflating the two is how you end up with a "running" process that silently returns wrong answers.

2. "What should process.on('uncaughtException') do, and why?"

It should log the error with full context and then begin a graceful shutdown that ends in process.exit(1). The official Node.js documentation states that resuming normal operation after an uncaught exception is not safe — by definition the exception escaped every catch block in the program, which means the developer never anticipated this state and the application's invariants may be broken. The handler exists so you can capture the error for diagnostics and clean up resources before terminating, not so you can keep serving traffic. The correct sequence is: log -> stop accepting new connections -> drain in-flight work with a bounded timeout -> close database and queue clients -> exit with a non-zero code so the supervisor (PM2, systemd, Kubernetes) restarts the process.

3. "What happens if you throw inside an async function? How do you catch it?"

Throwing inside an async function does not raise a synchronous exception — it causes the promise returned by that async function to reject with the thrown value. Mechanically: async function f() { throw new Error('x'); } is equivalent to function f() { return Promise.reject(new Error('x')); }. To catch it, you either await the call inside a try/catch block, or attach .catch() to the returned promise. The common bug is calling an async function without await inside a try/catch — the try block finishes before the rejection happens, so the catch never fires, and the rejection escapes to unhandledRejection.

4. "What is Error.cause and when would you use it?"

Error.cause is a standard property added in Node 16.9 (and ES2022) that lets you wrap one error inside another while preserving the original. You set it via the second argument to the Error constructor: new Error('Failed to load order', { cause: dbErr }). Node's default console.error printer follows the cause chain and prints both stack traces. The use case is layered error handling: a low-level component throws a technical error (ECONNREFUSED), and a higher layer catches it and re-throws a domain error (Failed to load order) that is meaningful to the caller. Without Error.cause, you either lose the original stack or have to copy fields manually. With it, debugging logs show the full chain from business symptom to root system cause.

5. "Describe a centralized error handler in Express. Why is it preferable to per-route try/catch?"

A centralized error handler is a four-argument middleware ((err, req, res, next)) registered after all routes. Express forwards any error passed to next(err) — or any rejection from an async route wrapped in a small asyncHandler helper — directly to it, skipping normal middleware. Inside the handler you classify the error: if it is an instanceof AppError and isOperational === true, log at warn level and return a structured response with the error's statusCode and code; otherwise log at error level with full request context, return a generic 500 to avoid leaking internals, and emit a fatal-error event that triggers graceful shutdown. The advantages over per-route try/catch are: route handlers stay focused on the happy path, error response shape is consistent across the entire API, logging and observability are uniform, and the operational-vs-programmer decision is made in exactly one place.


Quick Reference — Error Handling Cheat Sheet

+---------------------------------------------------------------+
|           ERROR HANDLING CHEAT SHEET                          |
+---------------------------------------------------------------+
|                                                                |
|  CUSTOM ERROR CLASS:                                           |
|  class AppError extends Error {                                |
|    constructor(msg, statusCode, { cause, code } = {}) {        |
|      super(msg, { cause });                                    |
|      this.statusCode = statusCode;                             |
|      this.isOperational = true;                                |
|      this.code = code;                                         |
|      Error.captureStackTrace(this, this.constructor);          |
|    }                                                           |
|  }                                                             |
|                                                                |
|  WRAP AN ERROR (Node 16.9+):                                   |
|  throw new AppError('Failed to load X', 500, { cause: err });  |
|                                                                |
|  ASYNC ROUTE HELPER:                                           |
|  const asyncHandler = (fn) => (req, res, next) =>              |
|    Promise.resolve(fn(req, res, next)).catch(next);            |
|                                                                |
|  CENTRAL HANDLER (Express, must have 4 args):                  |
|  app.use((err, req, res, next) => { ... });                    |
|                                                                |
|  FATAL EVENTS — log and exit, never swallow:                   |
|  process.on('uncaughtException', shutdown);                    |
|  process.on('unhandledRejection', shutdown);                   |
|  process.on('SIGTERM', shutdown);                              |
|  process.on('SIGINT', shutdown);                               |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           DECISION TABLE                                       |
+---------------------------------------------------------------+
|                                                                |
|  Is err instanceof AppError && isOperational?                  |
|    YES -> respond with err.statusCode, keep process alive      |
|    NO  -> respond 500 generic, begin graceful shutdown         |
|                                                                |
|  Did an uncaughtException fire?                                |
|    -> Log full stack -> shutdown() -> exit(1) ALWAYS           |
|                                                                |
|  Did an unhandledRejection fire?                               |
|    -> Same as uncaughtException                                |
|                                                                |
|  Did SIGTERM arrive?                                           |
|    -> shutdown() -> exit(0) (normal shutdown)                  |
|                                                                |
+---------------------------------------------------------------+
ConceptOperational ErrorProgrammer Error
SourceEnvironment, input, networkBugs in code
Examples404, timeout, validationTypeError, undefined ref
HandlingCatch, respond, continueLog, shutdown, restart
Markerinstanceof AppErrorAnything else
HTTP responseerr.statusCodeGeneric 500
Process stateStableUnknown / corrupt
try/catch worth it?YesOnly at the top level

Prev: Lesson 5.2 -- Promises and async/await Next: Lesson 5.4 -- Worker Threads


This is Lesson 5.3 of the Node.js Interview Prep Course -- 10 chapters, 42 lessons.

On this page