Streams
Node.js's Answer to Big Data
LinkedIn Hook
"Your Node.js server crashed reading a 2GB log file. The fix was three lines."
Most developers reach for
fs.readFile()without thinking. It works perfectly for a 10KB config file. It silently kills your process when someone uploads a 2GB video, dumps a giant CSV, or asks you to parse last month's access logs.The reason is simple:
readFile()loads the entire file into memory before giving you anything. A 2GB file becomes a 2GB Buffer in RAM. Multiply that by 50 concurrent users and your server is dead.Streams solve this. Instead of loading everything at once, streams process data in chunks — typically 64KB at a time. You can read a 100GB file on a server with 512MB of RAM, because at any given moment only one chunk lives in memory.
Streams are the reason Node.js can pipe an HTTP upload directly into S3 without buffering, transcode video on the fly, and serve thousands of clients with tiny memory footprints. They are also the single most underused feature in the entire runtime.
In Lesson 4.2, I break down the four stream types, the two reading modes, backpressure, and
pipe()— the one method that makes all of this work safely.Read the full lesson -> [link]
#NodeJS #Streams #Backend #Performance #InterviewPrep #JavaScript
What You'll Learn
- Why streams exist and what problem they solve (the 2GB file disaster)
- The four stream types: Readable, Writable, Duplex, Transform
- The two Readable modes: flowing vs paused (and how
dataevents flip the switch) - The core stream events:
data,end,error,finish,close - The
pipe()method and why it is almost always the right answer - Backpressure — what happens when a fast producer meets a slow consumer
- Object mode streams for non-binary data
- Modern async iteration with
for await ... of
The Firehose Analogy — Drink One Sip at a Time
Imagine you are thirsty and someone hands you a firehose. You cannot drink from a firehose — the pressure will knock you over and most of the water will be wasted on the floor. What you need is a cup. You fill the cup, drink it, fill it again, drink it again. Same total water, but now your throat (and the floor) survives.
That is the difference between fs.readFile() and fs.createReadStream(). The first one points the firehose at your face. The second one hands you a cup.
Now imagine an assembly line in a car factory. Nobody builds an entire car in one place. A door arrives at station 1, gets painted, slides to station 2, gets a window installed, slides to station 3, gets attached to a frame. At any moment, only the part being worked on is in front of any single worker. The line has thousands of cars in flight, but each station holds just one piece.
Streams are an assembly line for data. A chunk arrives, gets processed, moves on. The next chunk arrives, gets processed, moves on. You can compress, encrypt, parse, and ship gigabytes through a pipeline without ever holding more than 64KB in any single place.
+---------------------------------------------------------------+
| THE 2GB FILE PROBLEM |
+---------------------------------------------------------------+
| |
| WITHOUT STREAMS (fs.readFile): |
| |
| Disk [##############] 2GB |
| | |
| v (load EVERYTHING) |
| RAM [##############] 2GB <-- BOOM, out of memory |
| | |
| v |
| Your code finally runs |
| |
| WITH STREAMS (fs.createReadStream): |
| |
| Disk [##############] 2GB |
| | |
| v (read 64KB chunk) |
| RAM [#] 64KB --> process --> discard |
| | |
| v (next 64KB chunk) |
| RAM [#] 64KB --> process --> discard |
| | |
| v ... (repeat ~32,000 times) ... |
| |
| Peak memory: 64KB. Server is happy. |
| |
+---------------------------------------------------------------+
The Four Stream Types
Every stream in Node.js is one of four flavors. Once you understand these four, you understand every stream-based API in the runtime — file system, HTTP, crypto, zlib, child processes, sockets, everything.
+---------------------------------------------------------------+
| THE FOUR STREAM TYPES |
+---------------------------------------------------------------+
| |
| 1. READABLE (data flows OUT of it) |
| +--------+ |
| | source | ---> chunk ---> chunk ---> chunk ---> |
| +--------+ |
| Examples: fs.createReadStream, http.IncomingMessage, |
| process.stdin, net.Socket (read side) |
| |
| 2. WRITABLE (data flows IN to it) |
| |
| ---> chunk ---> chunk ---> chunk ---> +-------+ |
| | sink | |
| +-------+ |
| Examples: fs.createWriteStream, http.ServerResponse, |
| process.stdout, net.Socket (write side) |
| |
| 3. DUPLEX (read AND write, independent channels) |
| +----------+ |
| | duplex | <==> two-way pipe |
| +----------+ |
| Examples: net.Socket, TLS sockets |
| |
| 4. TRANSFORM (Duplex where output is computed from input) |
| +-------------+ |
| --->| in -> [fn] -> out |---> |
| +-------------+ |
| Examples: zlib.createGzip, crypto.createCipheriv, |
| any "translate as it flows" stream |
| |
+---------------------------------------------------------------+
A useful mental model: Readable is a tap, Writable is a drain, Duplex is a phone line (two independent channels in one object), and Transform is a water filter (whatever goes in comes out modified).
Readable Modes — Flowing vs Paused
Readable streams have two modes, and the mode determines who is in charge of pulling data.
Paused Mode (the default)
The consumer must explicitly call read() to pull a chunk. Nothing happens until you ask. Think of it as a vending machine — the snacks are there, but you have to push the button.
Flowing Mode
The stream pushes chunks at you as fast as it can produce them. You attach a data listener and chunks arrive automatically. Think of it as a conveyor belt that starts the moment you stand in front of it.
The switch: Adding a data event listener (or calling .resume(), or piping to a writable) flips the stream from paused to flowing. Calling .pause() flips it back. This is the single most surprising thing about streams for newcomers — attaching a listener has a side effect.
+---------------------------------------------------------------+
| PAUSED vs FLOWING MODE |
+---------------------------------------------------------------+
| |
| PAUSED (default after creation): |
| Stream: "I have data. Ask me for it." |
| You: chunk = stream.read() |
| Stream: "Here. Ask again when you want more." |
| |
| FLOWING (after .on('data') / .resume() / .pipe()): |
| Stream: "Catch! ... Catch! ... Catch! ..." |
| You: .on('data', chunk => process(chunk)) |
| Stream: keeps throwing chunks until empty or .pause() |
| |
| Mode transitions: |
| paused --[.on('data') | .resume() | .pipe()]--> flowing |
| flowing --[.pause()]----------------------------> paused |
| |
+---------------------------------------------------------------+
The Core Events
Every Readable emits a small, predictable set of events:
data— a new chunk is available. Fires repeatedly in flowing mode.end— no more data will be produced. Fires once, after the last chunk.error— something went wrong. Always attach a handler, or your process will crash.close— the underlying resource (file descriptor, socket) has been released.
Writable streams emit a slightly different set: drain (the buffer has emptied, you can write again), finish (all writes are flushed), error, and close.
Code Example 1 — Creating a Custom Readable Stream
Let's build a Readable stream from scratch that emits the numbers 1 to 5. This is the simplest possible stream, and it shows the contract: implement _read(), push chunks with this.push(chunk), push null when done.
// custom-readable.js
const { Readable } = require('node:stream');
// Subclass Readable and implement _read()
class CountStream extends Readable {
constructor(max) {
// objectMode: false means we push Buffers/strings
super();
this.current = 1;
this.max = max;
}
// _read is called by the stream machinery when it wants more data
_read() {
if (this.current > this.max) {
// Pushing null signals end-of-stream
this.push(null);
return;
}
// push() puts a chunk into the internal buffer for consumers
const chunk = `Number: ${this.current}\n`;
this.push(chunk);
this.current += 1;
}
}
const counter = new CountStream(5);
// Attaching 'data' flips the stream into flowing mode
counter.on('data', (chunk) => {
// chunk is a Buffer by default (use .toString() for text)
process.stdout.write(chunk);
});
counter.on('end', () => {
console.log('Stream finished. No more chunks coming.');
});
counter.on('error', (err) => {
console.error('Stream error:', err);
});
Output:
Number: 1
Number: 2
Number: 3
Number: 4
Number: 5
Stream finished. No more chunks coming.
The framework calls _read() whenever its internal buffer needs more data. You push chunks; you push null when finished. That is the entire Readable contract.
Code Example 2 — Copying a File with pipe()
This is the canonical example. Copy a 5GB file from one place to another, using essentially constant memory. Without pipe(), you would need to manually wire data, end, error, and drain events together. With pipe(), it is one line.
// file-copy-pipe.js
const fs = require('node:fs');
const { pipeline } = require('node:stream/promises');
async function copyFile(src, dest) {
// pipeline() is the modern, error-safe replacement for .pipe()
// It handles errors in EVERY stage and cleans up file descriptors
try {
await pipeline(
fs.createReadStream(src), // Readable source
fs.createWriteStream(dest) // Writable destination
);
console.log(`Copied ${src} -> ${dest}`);
} catch (err) {
// Any error in the source OR destination ends up here
console.error('Copy failed:', err);
}
}
// Even if huge.bin is 50GB, peak memory stays around 64KB per chunk
copyFile('huge.bin', 'huge-copy.bin');
Why pipeline instead of .pipe()? The classic src.pipe(dest) does not propagate errors — if the source errors, the destination is not closed, leaking file descriptors. pipeline() handles errors and cleanup automatically. In modern code, always prefer it.
Under the hood, pipe (and pipeline) wires up backpressure for you: when the destination's internal buffer fills up, the source is paused. When the destination drains, the source is resumed. You never write that logic by hand.
Code Example 3 — Async Iteration with for-await-of
Since Node 10, Readable streams are async iterables. This is the cleanest, most modern way to consume a stream — it looks like a normal for loop and handles errors via try/catch.
// async-iteration.js
const fs = require('node:fs');
const readline = require('node:readline');
async function countLinesInLog(path) {
// Create a readable stream of the file
const fileStream = fs.createReadStream(path, { encoding: 'utf8' });
// readline.createInterface gives us a stream of lines instead of bytes
const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity, // Treat \r\n as a single line break
});
let lineCount = 0;
let errorLines = 0;
// for-await-of consumes the readable as an async iterable
// Each iteration awaits the next chunk (here, the next line)
for await (const line of rl) {
lineCount += 1;
if (line.includes('ERROR')) {
errorLines += 1;
}
}
// The loop exits cleanly when the stream emits 'end'
return { lineCount, errorLines };
}
(async () => {
try {
const stats = await countLinesInLog('access.log');
console.log(`Total lines: ${stats.lineCount}`);
console.log(`Error lines: ${stats.errorLines}`);
} catch (err) {
// Stream errors bubble out of the for-await loop
console.error('Failed to read log:', err);
}
})();
This works on a 100GB log file just as well as on a 1KB one. Memory stays flat. The for await ... of syntax also pauses the stream automatically between iterations — if your loop body is slow, the stream waits. That is implicit backpressure for free.
Backpressure — The Fast Producer Problem
Backpressure is the most important concept in streams, and the one most developers ignore until production breaks.
The setup: A producer generates chunks fast. A consumer processes them slowly. If you naively forward every chunk from producer to consumer, the chunks pile up in memory between them. The producer keeps cranking, the buffer grows, and eventually you are out of RAM — even though you "used streams."
+---------------------------------------------------------------+
| BACKPRESSURE — WHY IT EXISTS |
+---------------------------------------------------------------+
| |
| WITHOUT BACKPRESSURE (broken): |
| |
| [Fast Producer] ---> [######### buffer #########] ---> [Slow Consumer]
| 100 MB/s keeps growing forever 1 MB/s
| |
| Buffer grows by 99 MB every second --> OOM crash |
| |
| WITH BACKPRESSURE (correct): |
| |
| [Fast Producer] -[paused]-> [#] -[full!]-> [Slow Consumer] |
| waits buffer stays small 1 MB/s |
| |
| write() returns false --> producer pauses |
| 'drain' event fires --> producer resumes |
| |
+---------------------------------------------------------------+
The mechanism: Every Writable has an internal buffer with a highWaterMark (default 16KB for binary, 16 objects for object mode). When you call writable.write(chunk):
- If the buffer has room, it returns
true. Keep writing. - If the buffer is full, it returns
false. Stop writing and wait for thedrainevent.
pipe() and pipeline() honor this automatically. If you write a custom consumer, you must check the return value of write().
Code Example 4 — Backpressure Demo
This example deliberately shows what backpressure looks like and how to respect it. We pump random bytes into a slow Writable.
// backpressure-demo.js
const { Writable } = require('node:stream');
const crypto = require('node:crypto');
// A deliberately slow Writable that takes 100ms per chunk
class SlowSink extends Writable {
// _write is called for each chunk
// The callback() must be invoked when this chunk is "done"
// Until you call callback(), the stream knows you are still busy
_write(chunk, encoding, callback) {
setTimeout(() => {
console.log(`Sink processed ${chunk.length} bytes`);
callback();
}, 100);
}
}
const sink = new SlowSink({ highWaterMark: 1024 }); // 1KB buffer
let written = 0;
function pump() {
let canContinue = true;
// Keep writing until the buffer says "stop"
while (canContinue && written < 10) {
const chunk = crypto.randomBytes(512);
written += 1;
// .write() returns false when the internal buffer is full
canContinue = sink.write(chunk);
console.log(`Wrote chunk #${written}, buffer accepted: ${canContinue}`);
}
if (written < 10) {
// Buffer is full -- wait for it to drain before writing more
console.log('Buffer full. Waiting for drain...');
sink.once('drain', () => {
console.log('Drained. Resuming pump.');
pump();
});
} else {
// All done -- end the stream
sink.end(() => console.log('All chunks flushed. Done.'));
}
}
pump();
Run it and you will see the producer pause when the buffer fills, then resume after the consumer catches up. This is exactly what pipe() does for you behind the scenes — but seeing it manually drives home what "honoring backpressure" really means.
Object Mode Streams
By default, streams move Buffer and string chunks. Object mode flips this: each chunk can be any JavaScript value — an object, an array, a number, anything except null (which still means end-of-stream).
const { Readable } = require('node:stream');
// Pass objectMode: true to send arbitrary JS values
const userStream = new Readable({
objectMode: true,
read() {
// push() now accepts plain objects instead of Buffers
this.push({ id: 1, name: 'Alice' });
this.push({ id: 2, name: 'Bob' });
this.push(null); // end-of-stream marker
},
});
userStream.on('data', (user) => {
// user is the actual object, not a Buffer
console.log(`User ${user.id}: ${user.name}`);
});
Object mode is how libraries like csv-parse, JSONStream, and most database drivers expose query results — one row per chunk, fully decoded, ready to use. The highWaterMark for object mode counts objects, not bytes (default: 16).
Common Mistakes
1. Using fs.readFile for large files.
The most common stream-related bug is not using streams at all. fs.readFile loads the entire file into memory. For anything that might exceed a few megabytes, use fs.createReadStream (or pipeline with it). The fix is usually three lines.
2. Forgetting the error event.
Streams emit errors as events, not exceptions. If you do not attach an error listener, an error becomes an uncaughtException and crashes the process. Always handle errors — or use pipeline(), which handles them for you.
3. Mixing data listeners with read() calls.
Once you attach a data listener, the stream is in flowing mode. Calling .read() on a flowing stream returns null and confuses everyone. Pick one mode and stick with it (or just use for await ... of, which sidesteps the issue entirely).
4. Ignoring backpressure when writing manually.
If you write a loop that calls writable.write(chunk) without checking the return value, you have built a memory bomb. Either honor the false return and wait for drain, or use pipe() / pipeline() and let Node handle it.
5. Using .pipe() instead of pipeline() in new code.
Classic .pipe() does not propagate errors. If the source stream errors, the destination is not cleaned up, leaking file descriptors. Always prefer stream.pipeline() (callback) or stream/promises.pipeline() (async/await) in modern code.
Interview Questions
1. "Why do streams exist in Node.js? What problem do they solve?"
Streams exist because loading entire datasets into memory does not scale. fs.readFile reads a 2GB file by allocating a 2GB Buffer — that is fine for small files but fatal for large ones, and impossible if you have many concurrent users. Streams process data in small chunks (default 64KB for files), so peak memory stays roughly constant regardless of total data size. They also enable composition: you can pipe a file read into a gzip transform into an HTTP response, and data flows through the pipeline without ever fully materializing in memory. Streams are how Node.js handles multi-gigabyte files, video transcoding, and high-throughput proxies on commodity hardware.
2. "What are the four types of streams and how do they differ?"
Readable streams produce data that you consume — examples are fs.createReadStream and http.IncomingMessage. Writable streams accept data that you produce — examples are fs.createWriteStream and http.ServerResponse. Duplex streams are both Readable and Writable but the two channels are independent — a TCP socket reads from the network and writes to the network, but the bytes you write are unrelated to the bytes you read. Transform streams are a special kind of Duplex where the output is computed from the input — zlib.createGzip reads raw bytes and writes compressed bytes, with a one-to-one relationship between input and output. The mental model: Readable = tap, Writable = drain, Duplex = two-way phone line, Transform = water filter.
3. "Explain backpressure and how Node.js handles it."
Backpressure happens when a producer generates data faster than a consumer can process it. Without a mechanism to slow the producer down, chunks accumulate in memory between them and eventually exhaust RAM — even though you are "using streams." Node handles this through the return value of writable.write(): when the writable's internal buffer reaches its highWaterMark, write() returns false, signaling "stop." The producer should wait for the drain event before writing more. The pipe() method and stream.pipeline() handle this automatically — they pause the source when the destination is full and resume it when it drains. If you write a custom consumer with a manual write() loop, you are responsible for honoring the return value yourself.
4. "What is the difference between flowing and paused mode in a Readable stream?"
A Readable stream starts in paused mode. In paused mode, nothing happens until you explicitly call .read() to pull a chunk — the consumer is in control. In flowing mode, the stream actively pushes chunks at you via data events as fast as it can produce them. The mode switches to flowing the moment you attach a data listener, call .resume(), or .pipe() the stream somewhere. It switches back to paused when you call .pause() or, in the case of pipe, when the destination signals backpressure. The surprise for newcomers is that simply attaching a listener has a side effect — adding .on('data', ...) does not just observe data, it causes data to start flowing. Modern code mostly avoids this distinction by using pipeline() or for await ... of, which manage the mode automatically.
5. "When would you use pipe() versus pipeline() versus for await ... of?"
pipeline() is the default modern choice for moving data from a source to a destination — it handles errors at every stage, cleans up file descriptors, and supports promises via node:stream/promises. The classic .pipe() method is older, does not propagate errors, and leaks resources on failure; you should only see it in legacy code. for await ... of is the right choice when you want to consume a stream as an async iterable — when you need to inspect each chunk in your own logic rather than just forwarding it to a writable. A typical rule of thumb: if you are connecting a source to a sink with no inspection, use pipeline(); if you need to look at each chunk and make decisions, use for await ... of; almost never write a manual .on('data') loop in new code.
Quick Reference — Streams Cheat Sheet
+---------------------------------------------------------------+
| THE FOUR STREAM TYPES |
+---------------------------------------------------------------+
| |
| TYPE DIRECTION CORE METHOD EXAMPLE |
| -------- ---------- ----------- -------------- |
| Readable out _read() fs.createReadStream |
| Writable in _write() fs.createWriteStream |
| Duplex both (indep) _read+_write net.Socket |
| Transform in -> out _transform() zlib.createGzip |
| |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| READABLE EVENTS |
+---------------------------------------------------------------+
| |
| data -- new chunk available (fires in flowing mode) |
| end -- no more data (fires once) |
| error -- something failed (always handle this!) |
| close -- underlying resource released |
| |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| WRITABLE EVENTS |
+---------------------------------------------------------------+
| |
| drain -- buffer is empty, safe to write again |
| finish -- end() was called and all data is flushed |
| error -- something failed |
| close -- underlying resource released |
| |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| KEY RULES |
+---------------------------------------------------------------+
| |
| 1. Never use fs.readFile for files that might be large |
| 2. Always handle the 'error' event (or use pipeline) |
| 3. Prefer pipeline() over .pipe() in modern code |
| 4. Honor backpressure: check write() return value |
| 5. push(null) signals end-of-stream in custom Readables |
| 6. for-await-of is the cleanest way to consume a stream |
| 7. Object mode counts objects, binary mode counts bytes |
| |
+---------------------------------------------------------------+
| Feature | fs.readFile | fs.createReadStream |
|---|---|---|
| Memory usage | Entire file in RAM | One chunk (~64KB) |
| Time to first byte | After full read | Immediate |
| Max practical file size | A few hundred MB | Unlimited |
| Composability | None | Pipes into anything |
| Error handling | Single callback | error event / pipeline |
| Best for | Tiny config files | Everything else |
| API | When to use |
|---|---|
pipeline() | Source -> sink, no inspection needed (default modern choice) |
for await ... of | Need to inspect each chunk, want try/catch errors |
.pipe() | Legacy code only — does not propagate errors |
Manual data events | Almost never in new code |
Prev: Lesson 4.1 -- Buffers in Node.js Next: Lesson 4.3 -- Practical Stream Patterns
This is Lesson 4.2 of the Node.js Interview Prep Course -- 10 chapters, 42 lessons.