Streams and spawning in Node.JS

I spent some time today streamlining the image processing plugin for wanderer, and figuring out more about how streams work in Node.JS

What the heck is a stream?

I am honestly not too sure. I’ve been implementing them for a while now, and I can’t easily describe them. All I know is that:

  1. you can read from them or write into them, and some of them can only do one or the other
  2. everything about them is asynchronous
  3. sadly, they don’t quite play nice with Promises or async / await in Javascript.

So instead I’ll link to documentation that tries to explain what they are!

https://www.sitepoint.com/basics-node-js-streams/ - this calls them ‘simple’, but I found streams to be anything other than simple.

For wanderer, I mostly used them to do file manipulation and to help with spawning child processes for exiftool and pngquant

Like how?

// opens a new Readable Stream from input file
const bufferStream = fs.createReadStream(inputFilePath)

// creates a passthrough stream - one that can be both written / read, and just forwards data to the next stream
const exifProcessedStream = new stream.PassThrough()

// here, we spawn a new process to run exiftool, and pass the data from our Readable file stream into stdin
const exifProcess = spawn('exiftool', ['-all=', '-', '-o', '-'])
bufferStream.pipe(exifProcess.stdin)

// we then pipe stdout into our passthrough stream...
exifProcess.stdout.pipe(exifProcessedStream)

const pngQuant = spawn('pngquant', ['-v', '-f', '-s10', '-'])

... which gets piped into pngQuant's stdin buffer...
exifProcessedStream
    .pipe(sharpTransformer)
    .pipe(pngQuant.stdin)

...and finally, the result from pngQuant is sent to an outputStream, which usually would be a Writeable file stream.
pngQuant.stdout.pipe(outputStream)

The main thing I had trouble understanding was the concept of pipe. It’s similar to the | character in unix terminal commands, where the output of one stream can be sent as the input of another. But do note that you have to pipe from a Readable stream to a Writeable stream - for longer chained pipes, you’ll need to make sure that the streams in the middle are duplex (both readable and writable).

new stream.PassThrough() creates a duplex stream where nothing happens between the read/write, and can be used to separate pipe chains. I use it in the current iteration of wanderer’s image processing stack, because JPEGs only perform the metadata removal process, while PNGs also have an extra layer of compression.

Some searches that may have been relevant to all this

https://stackoverflow.com/questions/13156243/event-associated-with-fs-createwritestream-in-node-js - how to tell when a file is finished streaming

exiftool

I take a lot of photos using my iPhone, which adds a sometimes disturbing amount of metadata to each picture. So all this effort was in service of figuring out a way to automatically remove that metadata.

I was using piexifjs for a little bit, but that only had support for JPGs, so I wanted something that worked with broader images. I landed on exiftool to do that:

https://www.shellhacks.com/remove-exif-data-images-photos-linux/

exiftool allows you to pipe an image in via stdin by replacing the FILE parameter with - (and lets you output to stdout by adding -o -, which essentially sets the output flag as -), so fit neatly into the streams workflow I’d already created to compress and work with sharp.

(See https://exiftool.org/forum/index.php?topic=3118.0 for source)