Export DB Data as CSV with Node streams and Lambdas
In webapps, It's pretty common to be asked to have an export as CSV functionality.
Let's be honest. It's not the best feature to write. It would be nice to get it done right away.
I will share a quick way to do it by:
- using Javascript (maybe AWS Lambdas)
- using NodeJS streams
- using AWS S3 to save the file
Why NodeJS Streams?
When we export db data, It's possible to be ran away of memory if we keep results in memory and save them into a file.
Here is where NodeJS Streams come in handy.
Streams are a powerful way of piping data through as it comes in, rather than all at once. You can read more about streams here at substack's stream handbook.
See official docs here.
DB Query
For building and executing the db query, We will use Knex.js, a pretty simple but powerful SQL query builder. It comes with built-in stream support.
Let's write some query:
const getQueryStream = (organizationId) =>db.select('*').from('users').where('organization_id', organizationId).stream();
db
is an abstraction of knex
object. See more here
S3 Upload
For uploading to AWS S3
, we will use NodeJS AWS SDK.
code:
const stream = require('stream');const S3 = require('aws-sdk/clients/s3');const uploadToS3 = (outputFile) => {const passT = new stream.PassThrough();const s3 = new S3({ apiVersion: '2006-03-01' });const params = {Body: passT,Bucket: YOUR_BUCKET, // put your bucket hereKey: `exports/${outputFile}`, // saving file inside exports folderACL: 'public-read',ContentType: 'text/csv',ContentEncoding: 'gzip',};return {promise: s3.upload(params).promise(),stream: passT,};};
From DB to CSV Row Transform
We need to take each query result row and transform it to CSV format. We could ask which fields we would like to be included on the result file.
code:
const stream = require('stream');const toCSVTransform = (fields) => new stream.Transform({objectMode: true,transform: (row, encoding, callback) => {let rowAsArr = [];for(let i = 0; i < fields.length; i++) {rowAsArr.push(row[fields[i]]);}callback(null, `${rowAsArr.join(',')}\n`);}});
Putting things together
We are almost there.
Since our result is text
, We would like to use Gzip to make smaller files. Learn more.
Also, We will use uuid/v1
library to generate a unique filenames every time our function is executed. Learn more
const zlib = require('zlib');const uuidv1 = require('uuid/v1');const getExportFileLink = (organizationId) => {const gzip = zlib.createGzip();const dbStream = getQueryStream(organizationId);const {stream: s3WritableStream,promise: s3Upload} = uploadToS3(`${uuidv1()}.csv`);dbStream.pipe(toCSVTransform(['first_name', 'last_name', 'email'])).pipe(gzip).pipe(s3WritableStream);return s3Upload.then(({ Location: link }) => link ));};
Final comments
- AWS Lambdas is awesome to run code like this. Be careful with concurrency. You do not want to reach maximum pool size on your db.
- In my production code, I have a queue with task definitions and some lambda is consuming and executing elements from the queue.
- Also, my code ends sending an email with the link (obviously!).