Convert XML to JSON in Node

Here’s a quick demo of how you can convert XML to JSON in NodeJS.

Problem

You are given a slew of XML files and you have to convert them to JSON.

But why JSON?

  • It takes lesser space
  • Faster and easier to work with

What are we dealing with here?

Convert this input -

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <string name="msg">Message</string>
    <string name="hello">Hello</string>
</root>

.. to something that we can relate to in JSON -

{
  "root": {
    "string": [
      { "attr": { "name": "msg" }, "text": "Message" },
      { "attr": { "name": "hello" }, "text": "Hello" }
    ]
  }
}

Planning

We scoff at people who write their own script for converting anything to anything - we work in Javascript after all.

So.. we head over to NPM (more specifically Google for a package that can solve our problem).

There probably are hundreds of packages that are addressing every complexity that there is while converting from XML to JSON. We just consider the package that solves our need, popular and performant - hopefully all the three in one place.

Some of the popular parsers -

  1. xml2js
  2. xml-js
  3. xml2json
  4. fast-xml-parser

Solution

I don’t quite deal with XMLs nowadays - thank God, but here was a problem that I could not shy away from. The problem statement was simple, and I was using NodeJS for scripting - something that I don’t do that often.

xml2json

The first try was xml2json because - well, that was the first search result. I could not get it to install - it has dependency on c-libraries and had to compile stuff. And, nodegyp was not happy that it couldn’t locate compiler although I have VS runtimes dating back to 2001.

I don’t waste time solving OS problems - we just jump over to the next package.

fast-xml-parser

fast-xml-parser does not use backend libraries - it’s a pure JS version to do the job. The parser has some impressive performance numbers to stand by it. Since we care about massive performance in a throwaway script for five input files - I just had to use this library.

The documentation is kind of odd since it assumes that you know everything you need to know. Not the friendliest place, but the developers seem awesome.

I play around the arguments and write some simple code to read XML and convert to JSON -

const fs = require("fs");

const xmlData = fs.readFileSync(`./infile.xml`, {
  encoding: "utf-8",
});

const parser = require("fast-xml-parser");

const jsonData = parser.parse(
  xmlData,
  {
    attrNodeName: "#attr",
    textNodeName: "#text",
    attributeNamePrefix: "",
    arrayMode: "false",
    ignoreAttributes: false,
    parseAttributeValue: true,
  },
  true
);

console.log("jsonData", jsonData);

I end up with the JSON below -

{
  "root": {
    "string": [
      { "#attr": { "name": "msg" }, "#text": "Message" },
      { "#attr": { "name": "hello" }, "#text": "Hello" }
    ]
  }
}

I could live with this - all I had to just write some code and run a couple of loops. I also liked the arrayMode argument that can convert attrs in XML to arrays instead of objects - should be useful at some point in the future.

Others

If you have an evening free, there is nothing better to do than try out other libraries. Both xml2js and xml-js are massively popular and appear to be more convenient to use.

In practice though I did not find much difference. Take this case of xml-js, which I chose because of this statement -

Maintain Order of Elements: Most libraries will convert <a/><b/><a/> to {a:[{},{}],b:{}} which merges any node of same name into an array. This library can create the following to preserve the order of elements: {"elements":[{"type":"element","name":"a"},{"type":"element","name":"b"},{"type":"element","name":"a"}]}.

(Now, which sane soul does not want order in their life?)

Using the library is as simple as anything else -

// same code as the earlier example

const parser = require("fast-xml-parser");

// .. code

parser.xml2js(xmlData, {
  compact: true,
});
//..

I get the below result -

{
  "root": {
    "string": [
      { "_attributes": { "name": "msg" }, "_text": "Message" },
      { "_attributes": { "name": "hello" }, "_text": "Hello" }
    ]
  }
}

Same result and there are quite a few options to tinker around too. The output can be more noisy for a more complex XML file.

The documentation is friendly and easy to understand.

Recommendation

Pick the best package for the job :).

And a note to self: do more automation in Go rather than Node.

comments powered by Disqus