Use MySQL Load Files Data in AdonisJS

Use MySQL load data function to load data from files within within AdonisJS services.

You can perform batch data and file operations efficiently using database utilities. The logic will likely be quicker, lighter on resources, and overall, more suited for batch jobs.

Loading data from files is not as sought-after as in the good-ol’ days, but is quite common in enterprise applications. One or more of the following operations may be required to load data from files -

  • Read from file
  • Find the delta that needs to go into tables
  • Insert or update data
  • Delete data to sync table with file

It is perfectly all right doing all this in server-side Javascript.

  • Use file read/write services in NodeJS
  • Query and insert/update/delete data from DB based on data in files

There were two potential performance issues -

  1. File read-write is not exactly fast
  2. Processing each file and looking up in database can be slow as well

Compounding the problem was the code that had to be written to do all that and be resilient to errors (retries, running same files and so on).

Although I did see an “acceptable” performance, I had to maintain this logic over a couple of years when there could be potentially hundreds or thousands of files waiting to dump data in database.

So, it was that I chose to go with native MySQL utilities. LOAD DATA can read from files, do interim data manipulations, and write data to tables. The syntax is simple and straight-forward as all SQLs are (yes, I lead an exciting life).

In its base form, LOAD DATA can be represented as:

LOAD DATA
    INFILE 'file_name'
    INTO TABLE tbl_name
    FIELDS TERMINATED BY ','
    (file_name, account_number, account_name)

Ref: MySQL docs

LOAD DATA could read from files, perform operations directly on tables to insert, update or ignore data based on table keys. This saved many many lines of code.

We could potentially run LOAD DATA as an independent DB utility on the database, or on our server. But we had scheduled tasks in AdonisJS and had to do some pre and post processing to the above routine. It is really convenient to handle all batch jobs and scheduled tasks in one place. And of course, it is easier to test!

Using LOAD DATA is as easy as using any other SQL in AdonisJS.

"use strict";

const Database = use("Database");

class ImportFileDatabase {
  async doImport(modifiedfiles) {
    let accountType = "External";

    for (let i = 0; i < modifiedfiles.length; i++) {
      const inFile = modifiedfiles[i]["name"];
      try {
        await Database.raw(
          `LOAD DATA LOCAL INFILE '${inFile}' IGNORE INTO TABLE accounts
          FIELDS TERMINATED BY ','
          LINES STARTING BY '#'
          TERMINATED BY '$'
            (account_name,@account_number,@total_sale)
          SET
            account_number = concat('EXT',@account_number)
            type_cd = ${accountType}
            total_sale =  CAST(@total_sale) AS UNSIGNED)
           ;`
        );
      } catch (e) {
        console.error("Error processing file " + inFile);
      }
    }
  }
}
module.exports = new ImportFileDatabase();

There are some interesting things happening here -

  • We take modified files one by one and feed it to LOAD DATA
  • Files have three fields delimited by ,. Each line will have precisely these three fields, and the lines begin with ‘\$’ and end with ‘#’. Note that the line delimiters are not common, but preferable if you are dealing with different servers (& therefore potentially different encoding)
  • IGNORE will tell MySQL to ignore duplicates. Duplicates are determined by the key on the table. For e.g. we can have a unique key based on account_name - accounts with same names as those existing in the table are ignored, while rest of the records are processed
  • While account_name is directly picked from file, rest not so. account_number is prefixed with ‘EXT’ before update, type_cd comes from the program, and total_sale is a cast value.
  • Errors are local to file. Even if one file fails, the batch will go on with the next file

I typically write all this logic in a service, and schedule a task to run the service using adonis-scheduler.

comments powered by Disqus