On-demand JSON: A better way to parse documents?
1 Several database systems such as CouchDB, RethinkDB, MongoDB, SimpleDB and JSON Tiles2 use JSON as their primary exchange format. A long-standing benchmark of several JSON parsers(henceforth Kostya) starts with a 10 000-long array of coordinates in JSON and requires that the parser sums all "x" values, all "y" values and all "z" values. Peltenburg et al.14 convert JSON to the Arrow format at the rate of tens of gigabytes per second on an FPGA. Stehle and Jacobsen similarly achieve high speeds, using a graphical processing unit.15 It is also possible to accelerate JSON parsing with multicore parallelism. The json iterator has also a reference to a pre-allocated string buffer-owned by the parser instance-where we may decode JSON strings: strings in JSON may contain escaped characters and we provide the user with an unescaped version stored in our own buffer. Given a JSON value, the raw json token() method provides a direct std::string view instance mapped to the original JSON string. Considering the geometric mean of the best results, we find that On Demand is 70% faster than the conventional simdjson, it is over 2.5 times faster than yyjson, and nearly eight times faster than RapidJSON. On Demand is nearly 50 times faster than JSON for Modern C++. TABLE 3. When considering the geometric mean, On Demand uses 60% of the instructions of the conventional simdjson, half of yyjson's instructions, and nearly eight times fewer instructions than RapidJSON. Compared to JSON for Modern C++, On Demand requires over 40 times fewer instructions.