Alternative ADL model serialization functions with nlohmann::json

nlohmann::json is my go-to JSON serialization and deserialization library for C++, even if it’s not the fastest out there right now.

One of the most convenient interfaces nlohmann::json provides is ADL (Argument-dependent lookup) based serialization, which allows to define a set of to_json(nlohmann::json &, const T&) template function specializations, within the same namespace as the data model, and the entire serialization happens automatically.

In this post I will show how to implement co-existing alternative JSON serialization logic for the same data model using nlohmann::json.

Problem

In one of my projects, which already had serialization defined for it’s underlying model using to_json(...) functions, it became necessary to provide alternative serialization of the same model into a different JSON schema, and the decision on which one to use was to be made at runtime.

While ADL in nlohmann::json library allows to easily separate the model from the serialization logic, it also by definition, prohibits co-existence of multiple template function specializations with the same signature.

Before diving into the problem further however, let’s first define a very simple data model that we will use for discussion. The entire model will reside in a file model.h.

model.h:

#pragma once

#include <string>
#include <vector>

namespace ns::model {
struct A {
    int int_value;
    float float_value;
    std::string string_value;
};

struct B {
    A a_value;
};

struct C {
    std::vector<B> b_values;
};
} // namespace ns::model

Let’s also assume that we already have a set of default serialization functions for this model in a file s11n.h:

#pragma once

#include "model.h"

#include <nlohmann/json.hpp>

namespace ns::model {
void to_json(nlohmann::json &j, const A &a)
{
    j["int_value"] = a.int_value;
    j["float_value"] = a.float_value;
    j["string_value"] = a.string_value;
}

void to_json(nlohmann::json &j, const B &b) { j["a_value"] = b.a_value; }

void to_json(nlohmann::json &j, const C &c) { j["b_values"] = c.b_values; }
} // namespace ns::model

Now however, someone asks us to provide alternative serialization logic, which based on a runtime flag, will render the JSON object keys in camel-case instead of snake-case.

Let’s create a test program to test the serialization called main.cc. First we set the runtime flag somehow, then we fill a test instance of ns::model::C class with some values, and then depending on the snake_case flag, we serialize the model to JSON with appropriate serialization logic and print the output.

main.cc:

#include <iostream>

#include "model.h"
#include "s11n.h"

auto main(int argc, char **argv) -> int
{
    if (argc != 2) {
        std::cout << "Usage: ./main <case>\n";
        return 1;
    }

    const bool snake_case{argv[1] == std::string{"snake"}};

    ns::model::C c;

    for (int i = 10; i < 12; i++) {
        ns::model::A a{i, static_cast<float>(i), std::to_string(i)};
        c.b_values.emplace_back(ns::model::B{std::move(a)});
    }

    nlohmann::json j;

    if (snake_case) {
        j = c;
    } else {
        // TODO: ??????
    }

    std::cout << j.dump(2) << '\n';

    return 0;
}

If snake_case is true the solution is obvious - we can just assign the c object to an instance of nlohmann::json and the serialization of the entire model will happen automatically.

But what should we do when snake_case is false? Clearly, we need to define an alternative set of to_json(...) functions. Since we cannot redefine the ones in ns::model namespace, we have to define them in a separate namespace - let’s create another file called s11n_camel.h.

s11n_camel.h:

#pragma once

#include "model.h"

#include <nlohmann/json.hpp>

namespace ns::s11n_camel {
  // TODO: ?????
}  // namespace ns::s11n_camel

At first, one might think that we can leverage the using directive, import the model types into this new namespace and then define the new set of to_json(...) functions like this:

s11n_camel.h:

#pragma once

#include "model.h"

#include <nlohmann/json.hpp>

namespace ns::s11n_camel {
using A = ns::model::A;
using B = ns::model::B;
using C = ns::model::C;

void to_json(nlohmann::json &j, const A &a)
{
  // ...
}

void to_json(nlohmann::json &j, const B &b)
{
  // ...
}

void to_json(nlohmann::json &j, const C &c)
{
  // ...
}
}  // namespace ns::s11n_camel

Unfortunately this won’t work, as C++ standard states that ADL is based on the underlying qualified type name and any of type aliases (both typedef or using) are not taken into account during argument-dependent lookup:

6.5.3 Argument-dependent name lookup [basic.lookup.argdep] 2 - For each argument type T in the function call, there is a set of zero or more associated namespaces and a set of zero or more associated entities (other than namespaces) to be considered. The sets of namespaces and entities are determined entirely by the types of the function arguments (and the namespace of any template template argument). Typedef names and using-declarations used to specify the types do not contribute to this set.

Solution

After a few failed experiments, I arrived at a solution that was satisfying enough. In a nutshell, since we cannot use the original types in the new set of to_json(...) functions, let’s create a wrapper class template, which we will use to force the ADL to use the right serialization functions.

The final solution is presented below:

s11n_camel.h:

#pragma once

#include "model.h"

#include <nlohmann/json.hpp>

namespace ns::s11n_camel {
template <typename T> class wrap {
public:
    constexpr explicit wrap(const T &v) noexcept
        : value_{v}
    {
    }

    constexpr const T &get() const noexcept { return value_; }

private:
    const T &value_;
};

using ns::model::A;
using ns::model::B;
using ns::model::C;

void to_json(nlohmann::json &j, const wrap<A> &a)
{
    j["intValue"] = a.get().int_value;
    j["floatValue"] = a.get().float_value;
    j["stringValue"] = a.get().string_value;
}

void to_json(nlohmann::json &j, const wrap<B> &b)
{
    j["aValue"] = wrap<A>(b.get().a_value);
}

void to_json(nlohmann::json &j, const wrap<C> &c)
{
    std::vector<wrap<B>> b_values_wrap{
        c.get().b_values.begin(), c.get().b_values.end()};
    j["bValues"] = std::move(b_values_wrap);
}
} // namespace ns::s11n_camel

Let’s break it down. The wrapper itself is called wrap and is simply a container for a const reference to a value of some type (it can only be constructed from a const reference to T). In order not to be too fancy, we won’t even overload any operators to access the stored value, but a simple get() method will suffice.

namespace ns::s11n_camel {
template <typename T> class wrap {
public:
    constexpr explicit wrap(const T &v) noexcept
        : value_{v}
    {
    }

    constexpr const T &get() const noexcept { return value_; }

private:
    const T &value_;
};

Next, let’s import the model types into this namespace, since they won’t impact the ADL anyway:

using ns::model::A;
using ns::model::B;
using ns::model::C;

Serialization of wrap<A> type is straightforward:

void to_json(nlohmann::json &j, const wrap<A> &a)
{
    j["intValue"] = a.get().int_value;
    j["floatValue"] = a.get().float_value;
    j["stringValue"] = a.get().string_value;
}

Instead of const A &a, the to_json function takes const wrap<A> &a, and the serialization logic creates the JSON object using camel-case.

Serialization of B object requires however, that we explicitly tell nlohmann::json library to use the to_json from this namespace to serialize A, not the one defined in ns::model - we can do this like this:

void to_json(nlohmann::json &j, const wrap<B> &b)
{
    j["aValue"] = wrap<A>(b.get().a_value);
}

Finally, to serialize the C class, we need to transform the entire C::b_values vector to a std::vector<wrap<B>>. Fortunately, it’s not a lot of work:

void to_json(nlohmann::json &j, const wrap<C> &c)
{
    std::vector<wrap<B>> b_values_wrap{
        c.get().b_values.begin(), c.get().b_values.end()};
    j["bValues"] = std::move(b_values_wrap);
}

And that’s it. Finally we can complete our main.cc:

main.cc:

#include <iostream>

#include "model.h"
#include "s11n.h"
#include "s11n_camel.h"

auto main(int argc, char **argv) -> int
{
    if (argc != 2) {
        std::cout << "Usage: ./main <case>\n";
        return 1;
    }

    const bool snake_case{argv[1] == std::string{"snake"}};

    ns::model::C c;

    for (int i = 10; i < 12; i++) {
        ns::model::A a{i, static_cast<float>(i), std::to_string(i)};
        c.b_values.emplace_back(ns::model::B{std::move(a)});
    }

    nlohmann::json j;

    if (snake_case) {
        j = c;
    }
    else {
        j = ns::s11n_camel::wrap(c);
    }

    std::cout << j.dump(2) << '\n';

    return 0;
}

by simply wrapping the c object and assigning to an instance of nlohmann::json:

      j = ns::s11n_camel::wrap(c);

Let’s compare the output:

$ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -S . -B build

$ cmake --build build -t main

$ ./build/main snake
{
  "b_values": [
    {
      "a_value": {
        "float_value": 10.0,
        "int_value": 10,
        "string_value": "10"
      }
    },
    {
      "a_value": {
        "float_value": 11.0,
        "int_value": 11,
        "string_value": "11"
      }
    }
  ]
}

$ ./build/main camel
{
  "bValues": [
    {
      "aValue": {
        "floatValue": 10.0,
        "intValue": 10,
        "stringValue": "10"
      }
    },
    {
      "aValue": {
        "floatValue": 11.0,
        "intValue": 11,
        "stringValue": "11"
      }
    }
  ]
}

Nice.

Performance

One thing that may be concerning, is that when implementing the new serialization functions we needed to cast the types from ns::model to ns::s11n_camel::wrap<> template, in particular we had to create an entire std::vector containing wrappers to serialize C class. To check the performance impact, we’ll use a microbenchmark library called nanobench.

benchmark.cc:

#define ANKERL_NANOBENCH_IMPLEMENT

#include <nanobench.h>

#include "model.h"
#include "s11n.h"
#include "s11n_camel.h"

auto build_model()
{
    ns::model::C m;
    for (int i = 0; i < 1000; i++) {
        ns::model::A a{i, (float)i, std::to_string(i)};
        m.b_values.emplace_back(ns::model::B{std::move(a)});
    }
    return m;
}

int main()
{
    nlohmann::json j1, j2;

    const auto c = build_model();

    ankerl::nanobench::Bench().run("Default s11n", [&] {
        j1 = c;
        ankerl::nanobench::doNotOptimizeAway(j1);
    });

    ankerl::nanobench::Bench().run("Camel s11n", [&] {
        j2 = ns::s11n_camel::wrap(c);
        ankerl::nanobench::doNotOptimizeAway(j2);
    });

    assert(j1["b_values"]["a_value"]["string_value"] ==
        j2["bValues"]["aValue"]["stringValue"]);
}

Now let’s build it and check the result:

$ cmake --build build -t benchmark
$ ./build/benchmark
Warning, results might be unstable:
* CPU frequency scaling enabled: CPU 0 between 400.0 and 5,881.0 MHz
* CPU governor is 'powersave' but should be 'performance'
* Turbo is enabled, CPU frequency will fluctuate

Recommendations
* Use 'pyperf system tune' before benchmarking. See https://github.com/psf/pyperf

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|          286,881.00 |            3,485.77 |    0.2% |      0.00 | `Default s11n`
|          289,917.00 |            3,449.26 |    0.2% |      0.00 | `Camel s11n`

ns/op	op/s	err%	total	benchmark
286,881.00	3,485.77	0.2%	0.00	`Default s11n`
289,917.00	3,449.26	0.2%	0.00	`Camel s11n`

Let’s make sure compiler optimizations were on:

$ cat build/compile_commands.json | jq '.[] | select(.file | endswith("benchmark.cc"))' | jq .command
"/usr/bin/c++  -I/home/bartek/devel/multiple-adl-serializers-nlohmann-json/build/_deps/json-src/include -I/home/bartek/devel/multiple-adl-serializers-nlohmann-json/build/_deps/nanobench-src/src/include -O3 -DNDEBUG -std=gnu++20 -o CMakeFiles/benchmark.dir/benchmark.cc.o -c /home/bartek/devel/multiple-adl-serializers-nlohmann-json/benchmark.cc"
$ /usr/bin/c++ --version
c++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Yep. The alternative serialization logic is only marginally slower then the default one, so clearly the compiler can figure out and optimize away the type wrapper effectively. And if you’re concerned about JSON serialization performance, you should probably look at alternatives to nlohmann::json such as Glaze or RapidJSON.

Conclusions

Obviously the presented data model is trivial as well as the alternative serialization logic. However the solution is generic, most importantly, it allows for type safe control of which serialization function is called in what context for which type - which can be very important with large number of types. Using this approach, we can also selectively reuse original serialization functions from the data model namespace when appropriate and only provide custom serialization for selected types.

The complete source code for this post can be found on my GitHub profile - if you know of a better way to handle this issue let me know here.

Alternative ADL model serialization functions with nlohmann::json

Problem

Solution

Performance

Conclusions

Further reading