Streams in Production -- Uploads, Video, and ETL
Streams in Production -- Uploads, Video, and ETL
LinkedIn Hook
"Your Node.js server crashed at 2 AM because a user uploaded a 4 GB video. Why?"
Because someone, somewhere, wrote
app.post('/upload', (req, res) => { const data = req.body; fs.writeFileSync('upload.bin', data); })and shipped it. That single line buffered the entire file into RAM before touching disk. With 50 concurrent uploads, your 8 GB server is dust.Streams are not an optimization. In production Node.js they are a survival mechanism. Every framework you love -- Express, Fastify, Multer, Busboy, Got, Undici -- is built on top of them. The HTTP request object is a Readable. The HTTP response object is a Writable. When you
pipethem through transforms, you are using the same primitive that powers Netflix's video delivery, Stripe's webhook ingestion, and your favorite log aggregator.In Lesson 4.4, I walk through the four production patterns every senior Node.js engineer must know: Server-Sent Events for live API responses, multipart upload handling with Busboy, HTTP Range requests for video seeking, and full ETL pipelines that process gigabytes of CSV with a memory footprint smaller than a browser tab.
Read the full lesson -> [link]
#NodeJS #BackendEngineering #Streams #SystemDesign #InterviewPrep
What You'll Learn
- How to stream API responses with Server-Sent Events and chunked transfer encoding
- How to handle multipart file uploads with Busboy without buffering entire files
- How to serve video and audio with HTTP Range requests (206 Partial Content)
- How to build log processors and ETL pipelines using Transform streams
- Why Express, Fastify, and Multer are stream-based under the hood
- The dramatic memory difference between buffered and streamed approaches at scale
The Conveyor Belt Analogy -- Why Streams Win in Production
Imagine a postal sorting facility. Two designs are possible.
Design A (buffered): Trucks unload all 50,000 packages into a giant warehouse. Workers wait. Once every package is on the floor, sorting begins. Once every package is sorted, delivery trucks load up. The warehouse must be big enough to hold all 50,000 packages at peak. If two trucks arrive at once, you need double the warehouse. If a million packages arrive on Black Friday, the warehouse collapses.
Design B (streamed): A conveyor belt runs from the truck dock to the sorting machines to the delivery dock. Each package is sorted and routed the moment it lands on the belt. The warehouse holds only the few hundred packages currently in motion -- regardless of whether you process 50,000 or 50,000,000 packages a day. Throughput scales. Memory does not.
Production Node.js servers are sorting facilities. HTTP requests are trucks. The bytes inside requests and responses are packages. If you call req.body or fs.readFileSync you built Design A. If you pipe through transforms you built Design B. The framework primitives -- Express's req, Fastify's reply, Multer's storage engine -- all assume Design B because Design A does not survive contact with real traffic.
+---------------------------------------------------------------+
| BUFFERED SERVER (Design A) |
+---------------------------------------------------------------+
| |
| Request (4 GB) -> [RAM: 4 GB] -> process -> [RAM: 4 GB] -> resp
| |
| 10 concurrent uploads = 40 GB RAM = OOM kill |
| Latency to first byte = full upload time |
| |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| STREAMED SERVER (Design B) |
+---------------------------------------------------------------+
| |
| Request --[64 KB chunk]--> transform --[64 KB chunk]--> resp |
| |
| 10 concurrent uploads = ~640 KB RAM = stable |
| Latency to first byte = first chunk arrival (~ms) |
| |
+---------------------------------------------------------------+
Napkin AI Visual Prompt: "Dark gradient (#0a1a0a -> #0d2e16). LEFT panel labeled 'Buffered' shows a giant red warehouse bursting with packages, a tiny truck waiting outside, and a memory meter pegged at 100%. RIGHT panel labeled 'Streamed' shows a smooth green conveyor belt with packages flowing left to right through sorting machines, a steady line of trucks loading and unloading, and a memory meter at 5%. Amber (#ffb020) divider between the two. White monospace labels."
Pattern 1 -- Streaming API Responses with SSE and Chunked Encoding
HTTP/1.1 supports Transfer-Encoding: chunked, which lets the server start sending the response body before it knows the total size. Node.js enables this automatically the moment you call res.write() without setting Content-Length. Server-Sent Events (SSE) builds on top of chunked encoding to push named events to the browser over a single long-lived HTTP connection.
This is the foundation of "streaming" LLM responses, live dashboards, progress bars, and notification feeds. No WebSocket required. No client library required. The browser's EventSource API speaks SSE natively.
// sse-server.js
// A live event feed using Server-Sent Events over chunked HTTP.
const http = require('http');
http.createServer((req, res) => {
if (req.url !== '/events') {
res.writeHead(404).end();
return;
}
// SSE requires these exact headers. No Content-Length -> chunked encoding.
res.writeHead(200, {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache, no-transform',
'Connection': 'keep-alive',
'X-Accel-Buffering': 'no', // disable nginx buffering if present
});
// Send an initial comment to open the stream immediately.
res.write(': connected\n\n');
let id = 0;
const timer = setInterval(() => {
id += 1;
// SSE frame format: id, event, data, terminated by a blank line.
res.write(`id: ${id}\n`);
res.write(`event: tick\n`);
res.write(`data: ${JSON.stringify({ ts: Date.now(), id })}\n\n`);
}, 1000);
// Critical: clean up when the client disconnects.
req.on('close', () => {
clearInterval(timer);
res.end();
});
}).listen(3000, () => console.log('SSE on http://localhost:3000/events'));
<!-- client.html -->
<script>
// EventSource is a built-in browser API. No library needed.
const es = new EventSource('http://localhost:3000/events');
es.addEventListener('tick', (e) => {
console.log('tick', JSON.parse(e.data));
});
</script>
The same pattern powers token streaming from LLM APIs. Replace the setInterval with a Readable that yields tokens from an upstream model and you have a ChatGPT-style typewriter UI without changing a single line on the client.
Pattern 2 -- Multipart File Uploads with Busboy
multipart/form-data is the wire format browsers use when a <form enctype="multipart/form-data"> posts files. The body looks like a series of parts separated by a randomly chosen boundary string. A naive parser would buffer the whole body, find boundaries, and then split. Busboy does the opposite: it parses the multipart stream incrementally and emits each file as its own Readable stream the moment its headers arrive. You can pipe that file directly to disk, S3, or a virus scanner without ever holding the entire upload in memory.
Multer (Express) and @fastify/multipart (Fastify) are both wrappers around Busboy.
// upload-server.js
// Streaming multipart upload handler -- constant memory regardless of file size.
const http = require('http');
const fs = require('fs');
const path = require('path');
const { pipeline } = require('stream/promises');
const Busboy = require('busboy');
http.createServer(async (req, res) => {
if (req.method !== 'POST' || req.url !== '/upload') {
res.writeHead(404).end();
return;
}
// Busboy needs the request headers to read the multipart boundary.
const bb = Busboy({
headers: req.headers,
limits: {
fileSize: 100 * 1024 * 1024, // 100 MB hard cap per file
files: 5, // max 5 files per request
},
});
const saved = [];
// Each 'file' event gives us a Readable stream for one uploaded file.
bb.on('file', async (fieldName, fileStream, info) => {
const { filename, mimeType } = info;
const safeName = `${Date.now()}-${path.basename(filename)}`;
const dest = path.join('/tmp/uploads', safeName);
// Pipe directly to disk. Backpressure is handled automatically.
try {
await pipeline(fileStream, fs.createWriteStream(dest));
saved.push({ field: fieldName, file: safeName, mimeType });
} catch (err) {
// If the client aborts mid-upload, drain the stream so Busboy continues.
fileStream.resume();
console.error('upload failed:', err.message);
}
});
// Regular text fields (non-file) come through 'field'.
bb.on('field', (name, value) => {
console.log('field', name, '=', value);
});
bb.on('close', () => {
res.writeHead(200, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ ok: true, saved }));
});
bb.on('error', (err) => {
res.writeHead(400).end(err.message);
});
// Pipe the request body into Busboy. This is where streaming begins.
req.pipe(bb);
}).listen(4000, () => console.log('uploads on http://localhost:4000/upload'));
A 4 GB upload through this server uses roughly 64 KB of RAM (one chunk in flight) -- not 4 GB. Run 100 concurrent 4 GB uploads and the server stays well under 100 MB of memory. That is why every serious Node.js framework uses Busboy under the hood.
Pattern 3 -- HTTP Range Requests for Video and Audio
When you scrub a video or skip to the middle of a podcast, the browser does not redownload the file. It sends a Range: bytes=START-END header and the server replies with 206 Partial Content, sending only the requested byte range. This is how Netflix, YouTube, Spotify, and your <video> tag work. Implementing it correctly in Node.js takes about 30 lines.
// video-server.js
// Range-aware static video server. Supports seeking in <video> tags.
const http = require('http');
const fs = require('fs');
const path = require('path');
const VIDEO = path.resolve('./sample.mp4');
http.createServer((req, res) => {
if (req.url !== '/video') {
res.writeHead(404).end();
return;
}
// Stat the file to get its size for the Content-Range header.
fs.stat(VIDEO, (err, stat) => {
if (err) {
res.writeHead(404).end();
return;
}
const fileSize = stat.size;
const range = req.headers.range;
if (!range) {
// No Range header -> stream the whole file with 200 OK.
res.writeHead(200, {
'Content-Length': fileSize,
'Content-Type': 'video/mp4',
});
fs.createReadStream(VIDEO).pipe(res);
return;
}
// Parse "bytes=START-END". END is optional.
const match = /^bytes=(\d+)-(\d*)$/.exec(range);
if (!match) {
res.writeHead(416, { 'Content-Range': `bytes */${fileSize}` }).end();
return;
}
const start = parseInt(match[1], 10);
// Cap each chunk at ~1 MB so the browser can re-request as it plays.
const end = match[2]
? Math.min(parseInt(match[2], 10), fileSize - 1)
: Math.min(start + 1024 * 1024 - 1, fileSize - 1);
if (start >= fileSize || end < start) {
res.writeHead(416, { 'Content-Range': `bytes */${fileSize}` }).end();
return;
}
const chunkSize = end - start + 1;
// 206 Partial Content tells the browser this is a range response.
res.writeHead(206, {
'Content-Range': `bytes ${start}-${end}/${fileSize}`,
'Accept-Ranges': 'bytes',
'Content-Length': chunkSize,
'Content-Type': 'video/mp4',
});
// Create a read stream for only the requested byte range.
fs.createReadStream(VIDEO, { start, end }).pipe(res);
});
}).listen(5000, () => console.log('video on http://localhost:5000/video'));
The browser will issue dozens of small range requests as the user scrubs. Each request opens a tiny stream over a small slice of the file. Memory stays flat. Seeking is instant because the server never reads bytes the browser did not ask for.
Pattern 4 -- ETL Pipeline: CSV -> Transform -> Database
The most common backend job is "ingest a giant CSV, clean it, write it to a database." Done with fs.readFileSync and JSON.parse it caps out at a few hundred megabytes. Done as a stream pipeline it scales to terabytes with constant memory and built-in backpressure. The pattern is always the same: a Readable source, one or more Transform stages, and a Writable sink, joined by pipeline() so errors propagate and resources clean up.
// etl.js
// Stream a 10 GB CSV, validate rows, and batch-insert into Postgres.
const fs = require('fs');
const { pipeline } = require('stream/promises');
const { Transform, Writable } = require('stream');
const { parse } = require('csv-parse'); // npm i csv-parse
const { Pool } = require('pg'); // npm i pg
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
// Stage 1: parse CSV bytes into JS objects (one per row).
const csvParser = parse({
columns: true, // first row becomes object keys
skip_empty_lines: true,
trim: true,
});
// Stage 2: validate and normalize each row. Drop bad rows instead of crashing.
const validate = new Transform({
objectMode: true,
transform(row, _enc, cb) {
const email = (row.email || '').toLowerCase();
const age = Number(row.age);
if (!email.includes('@') || !Number.isFinite(age)) {
// Skip invalid row -- do not push it downstream.
return cb();
}
cb(null, { email, age, name: row.name || null });
},
});
// Stage 3: batch rows into groups of 1000 for efficient bulk insert.
const batch = (size) => {
let buf = [];
return new Transform({
objectMode: true,
transform(row, _enc, cb) {
buf.push(row);
if (buf.length >= size) {
this.push(buf);
buf = [];
}
cb();
},
flush(cb) {
// Emit any leftover rows when the upstream ends.
if (buf.length) this.push(buf);
cb();
},
});
};
// Stage 4: write each batch to Postgres. Backpressure pauses upstream
// automatically while the DB is busy.
const dbSink = new Writable({
objectMode: true,
async write(rows, _enc, cb) {
try {
const values = rows.map((r) => `('${r.email}', ${r.age})`).join(',');
await pool.query(`INSERT INTO users (email, age) VALUES ${values}`);
cb();
} catch (err) {
cb(err); // pipeline() will tear down every stage
}
},
});
(async () => {
console.time('etl');
await pipeline(
fs.createReadStream('users.csv'),
csvParser,
validate,
batch(1000),
dbSink,
);
console.timeEnd('etl');
await pool.end();
})().catch((err) => {
console.error('etl failed:', err);
process.exit(1);
});
The same shape applies to log processing -- swap the CSV parser for a line-splitter, swap the DB sink for an Elasticsearch bulk client, and you have a log shipper. Swap the Postgres sink for s3.upload({ Body: stream }) and you have an S3 ingester. The Transform stream is the universal middle stage of every Node.js data pipeline.
Why Express, Fastify, and Multer Are Stream-Based Internally
Every Node.js HTTP framework you have ever used is a thin layer over the same two stream objects: IncomingMessage (a Readable) and ServerResponse (a Writable). When Express does res.send(buffer) it ultimately calls res.write() followed by res.end() -- both Writable methods. When Fastify writes a JSON reply it pipes a Readable into the same Writable. There is no parallel "non-stream" path inside Node's HTTP server -- streams are the only path.
- Express's
body-parserreadsreqchunk by chunk and only assembles a body once it has verified theContent-Lengthis under the configured limit. That limit exists precisely because the old, unbounded buffering approach got servers killed. - Multer is a Busboy wrapper. Its "memory storage" buffers small files into RAM (for tests), and its "disk storage" pipes Busboy file streams directly to
fs.createWriteStream. Cloud-storage adapters (multer-s3, multer-gcs) pipe the same stream into a cloud SDK uploader. - Fastify's reply supports
reply.send(stream)directly -- it detects a Readable and pipes it into the response, including content negotiation and error handling. res.sendFile()in both frameworks isfs.createReadStream(path).pipe(res)with extra MIME-type and Range handling on top.
The takeaway: when you use a framework idiomatically, you are already using streams. The mistakes happen the moment you reach for await req.text(), req.body on a giant payload, or fs.readFileSync -- those bypass the streaming machinery and put you back in Design A.
+---------------------------------------------------------------+
| MEMORY USAGE AT 100 CONCURRENT 1 GB UPLOADS |
+---------------------------------------------------------------+
| |
| Buffered (req.body): ~100 GB RAM -> OOM kill |
| Streamed (Busboy): ~6.4 MB RAM -> stable |
| |
| Throughput, buffered: bottlenecked by RAM, ~3 uploads |
| Throughput, streamed: bottlenecked by disk, ~100 uploads |
| |
| Latency to first byte saved on response: 0 vs full body time |
| |
+---------------------------------------------------------------+
Common Mistakes
1. Calling JSON.parse(await streamToString(req)) on a large request body.
This buffers the entire body before doing anything. A 1 GB JSON upload takes 1 GB of RAM and blocks the event loop during parse. Use a streaming JSON parser (stream-json, JSONStream) or enforce a small Content-Length limit and reject anything larger.
2. Forgetting to handle req close on long-lived SSE connections.
If you do not clear timers and remove listeners when the client disconnects, you leak intervals, sockets, and database cursors. Always wire req.on('close', cleanup) for any response that uses chunked encoding for more than a few seconds.
3. Reading the whole video file before serving it.
fs.readFile('movie.mp4', (err, buf) => res.end(buf)) works for a 5 MB clip and explodes for a 5 GB film. Always use fs.createReadStream and always honor the Range header -- otherwise the browser cannot seek and will redownload from byte zero on every scrub.
4. Not draining unused file streams in Busboy.
If you decide to reject an uploaded file (wrong type, too large), you must still call fileStream.resume() to drain it. Otherwise Busboy stalls waiting for that file to be consumed and the entire request hangs until the client times out.
5. Skipping pipeline() and chaining .pipe() manually.
Manual .pipe() does not propagate errors backwards. If your DB writer throws, the upstream CSV reader keeps reading and may leak file descriptors. stream/promises.pipeline() tears down every stage on any error and surfaces a single rejection -- always use it for production pipelines.
Interview Questions
1. "How would you stream a 50 GB CSV file from S3, transform each row, and write the results to a database without exhausting memory?"
Build a four-stage pipeline using stream/promises.pipeline(). Stage one is the AWS SDK's getObject().createReadStream() (or GetObjectCommand body), which yields the S3 object as a Readable without buffering. Stage two is csv-parse in object mode, converting bytes into row objects. Stage three is a Transform in object mode that validates, normalizes, and batches rows -- typically grouping 500 to 1000 rows per emitted chunk to amortize database round trips. Stage four is a Writable in object mode that bulk-inserts each batch into Postgres or your warehouse, using parameterized queries. Backpressure flows automatically: when the database writer is slow, Writable.write returns false, the batch transform pauses, the parser pauses, and the S3 stream pauses -- the HTTP socket is held open by AWS until the pipeline catches up. Memory stays under a few hundred megabytes regardless of file size, and pipeline() ensures any error in any stage cleans up every other stage.
2. "Explain Server-Sent Events versus WebSockets. When would you choose SSE for a streaming API?"
SSE is one-way: the server pushes events to the browser over a single long-lived HTTP/1.1 chunked response. WebSockets are bidirectional and use their own framing protocol after an HTTP upgrade. Choose SSE when communication is server-to-client only -- LLM token streaming, live dashboards, build logs, notification feeds, progress bars. SSE is dramatically simpler: it works through every HTTP proxy and CDN that supports chunked encoding, it auto-reconnects via the browser's EventSource API, it carries event IDs for resume after disconnect, and you can implement it in Node with res.write() and no library. Choose WebSockets when you need client-to-server messages on the same connection -- chat, multiplayer games, collaborative editing. For one-way streaming APIs SSE is almost always the right call.
3. "Walk through how you would implement HTTP Range request support for a video endpoint, and why it matters."
When a browser plays a <video> element, it issues a normal GET first, sees Accept-Ranges: bytes in the response, and from then on issues Range: bytes=START-END requests as the user scrubs or as the buffer fills. To support this in Node: stat the file to get its size, parse the Range header with a regex like /^bytes=(\d+)-(\d*)$/, clamp the end to fileSize - 1, and respond with 206 Partial Content. Set Content-Range: bytes START-END/TOTAL, Content-Length: chunkSize, Accept-Ranges: bytes, and the correct MIME type. Then create the file stream with fs.createReadStream(path, { start, end }) and pipe it to the response. This matters because without Range support the browser must redownload from byte zero every time the user seeks, which wastes bandwidth, makes seeking feel broken, and prevents adaptive bitrate features. Every CDN and every video player assumes Range requests work.
4. "Why does Multer use Busboy instead of buffering the request body itself?"
Because buffering does not scale. A naive multipart parser that calls await readBody(req) then splits on the boundary needs RAM equal to the upload size times the concurrency. At 100 concurrent 500 MB uploads that is 50 GB of RAM. Busboy parses the multipart stream incrementally: the moment it sees a part header it emits a file event with a Readable stream for that part, even though the rest of the file is still being uploaded. Multer pipes that stream directly to its storage engine -- disk, memory, or a cloud uploader -- so memory usage stays at one chunk per concurrent upload (typically 64 KB). This is also why Multer's storage engines are written as Writables: they slot directly into the streaming pipeline. The price of bypassing this design is exactly the OOM crashes that Express's body-parser limits exist to prevent.
5. "What is the practical memory difference between buffering and streaming an HTTP response, and when does it become catastrophic?"
Buffering allocates memory equal to the response size per concurrent request. Streaming allocates memory equal to one chunk (default highWaterMark, 16 KB for byte streams, 64 KB for fs streams) per concurrent request. The ratio is response-size divided by 64 KB. For a 1 KB JSON reply the difference is irrelevant. For a 4 KB API payload it does not matter. The pattern becomes catastrophic the moment payloads grow past a few megabytes or concurrency grows past a few dozen. Serving a 100 MB report to 50 users buffered is 5 GB of RAM and several seconds of GC pressure that blocks the event loop, harming every other request on the process. Streamed, the same workload is 3.2 MB of RAM and zero GC stalls, with the first byte arriving in milliseconds. The rule of thumb: any response larger than ~1 MB or generated from an upstream source (DB cursor, file, S3 object, another HTTP call) should be streamed, not buffered.
Quick Reference -- Production Streams Cheat Sheet
+---------------------------------------------------------------+
| STREAMS IN PRODUCTION CHEAT SHEET |
+---------------------------------------------------------------+
| |
| SERVER-SENT EVENTS: |
| Content-Type: text/event-stream |
| Cache-Control: no-cache, no-transform |
| Connection: keep-alive |
| Frame: id\nevent\ndata\n\n |
| Always cleanup on req.on('close') |
| |
| MULTIPART UPLOAD (Busboy): |
| const bb = Busboy({ headers: req.headers, limits: {...} }) |
| bb.on('file', (name, stream, info) => stream.pipe(dest)) |
| req.pipe(bb) |
| Drain rejected files with stream.resume() |
| |
| RANGE REQUESTS (206): |
| Parse: /^bytes=(\d+)-(\d*)$/ |
| Headers: Content-Range, Accept-Ranges, Content-Length |
| fs.createReadStream(path, { start, end }).pipe(res) |
| |
| ETL PIPELINE: |
| pipeline(source, parse, validate, batch, sink) |
| Use objectMode: true on transforms after the parser |
| Batch before DB writes (500-1000 rows) |
| |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| KEY RULES |
+---------------------------------------------------------------+
| |
| 1. Never buffer request bodies larger than a few KB |
| 2. Always use stream/promises.pipeline() for production |
| 3. Honor Range headers on any file >1 MB |
| 4. Set explicit limits on Busboy (fileSize, files, fields) |
| 5. Clean up SSE timers on req.on('close') |
| 6. Batch DB writes inside Writable streams |
| 7. Drain unused file streams or Busboy will stall |
| |
+---------------------------------------------------------------+
| Concern | Buffered approach | Streamed approach |
|---|---|---|
| Memory per request | O(payload size) | O(highWaterMark) |
| Time to first byte | After full body | Immediate |
| Max payload | Limited by RAM | Unlimited |
| Backpressure | None | Automatic |
| Range support | Manual slicing | createReadStream({start,end}) |
| Error cleanup | Manual | pipeline() tears down all stages |
| Concurrency at 1 GB payload | ~few requests | Hundreds |
Prev: Lesson 4.3 -- Practical Stream Patterns Next: Lesson 5.1 -- The Callback Pattern
This is Lesson 4.4 of the Node.js Interview Prep Course -- 10 chapters, 42 lessons.