I was working on an ETL process with NodeJS that ran on Heroku, which limits you to 500MB of memory on a standard dyno. The ETL was quickly running out of memory, even through each document that it processed was less than 1MB.
This ETL was made up of a
Readable stream from a MongoDB
Cursor, then two
streams, one using
event-stream and one
implemented as a subclass of
Stream.Transform. The last stream was a
that validated and wrote back documents to MongoDB.
Clearly the readable stream was reading documents out of Mongo way too quickly for the
transform streams to do their work and the writable stream to write the documents back to
MongoDB. But wasn't NodeJS suppose to manage this automatically, assuming the
was set to something reasonable?
It turns our that, yes, NodeJS does manage this correctly, as long as every stream in your pipe
is a 0.10.x stream or greater. NodeJS implemented the
highWaterMark in Node 0.10.x.
The culprit in my case was
event-stream, which is an old library that only makes 0.8.x streams.
Moral of the story: if you are having memory issues with streams that seem like backpressure should be solving, make sure all of the streams in your pipe are 0.10.x streams or greater!