Introducing GVariant schemas

GLib supports a binary data format called GVariant, which is commonly used to store various forms of application data. For example, it is used to store the dconf database and as the on-disk data in OSTree repositories.

The GVariant serialization format is very interesting. It has a recursive type-system (based on the DBus types) and is very compact. At the same time it includes padding to correctly align types for direct CPU reads and has constant time element lookup for arrays and tuples. This make GVariant a very good format for efficient in-memory read-only access.

Unfortunately the APIs that GLib has for accessing variants are not always great. They are based on using type strings and accessing children via integer indexes. While this is very dynamic and flexible (especially when creating variants) it isn’t a great fit for the case where you have serialized data in a format that is known ahead of time.

Some negative aspects are:

Each g_variant_get_child() call allocates a new object.
There is a lot of unavoidable (atomic) refcounting.
It always uses generic codepaths even if the format is known.

If you look at some other binary formats, like Google protobuf, or Cap’n Proto they work by describing the types your program use in a schema, which is compiled into code that you use to work with the data.

For many use-cases this kind of setup makes a lot of sense, so why not do the same with the GVariant format?

With the new GVariant Schema Compiler you can!

It uses a interface definition language where you define the types, including extra information like field names and other attributes, from which it generates C code.

For example, given the following schema:

type Gadget {
  name: string;
  size: {
    width: int32;
    height: int32;
  };
  array: []int32;
  dict: [string]int32;
};

It generates (among other things) these accessors:

const char *    gadget_ref_get_name   (GadgetRef v);
GadgetSizeRef   gadget_ref_get_size   (GadgetRef v);
Arrayofint32Ref gadget_ref_get_array  (GadgetRef v);
const gint32 *  gadget_ref_peek_array (GadgetRef v,
                                       gsize    *len);
GadgetDictRef   gadget_ref_get_dict   (GadgetRef v);

gint32 gadget_size_ref_get_width  (GadgetSizeRef v);
gint32 gadget_size_ref_get_height (GadgetSizeRef v);

gsize  arrayofint32_ref_get_length (Arrayofint32Ref v);
gint32 arrayofint32_ref_get_at     (Arrayofint32Ref v,
                                    gsize           index);

gboolean gadget_dict_ref_lookup (GadgetDictRef v,
                                 const char   *key,
                                 gint32       *out);

Not only are these accessors easier to use and understand due to using C types and field names instead of type strings and integer indexes, they are also a lot faster.

I wrote a simple performance test that just decodes a structure over an over. Its clearly a very artificial test, but the generated code is over 600 times faster than the code using g_variant_get(), which I think still says something.

Additionally, the compiler has a lot of other useful features:

You can add a custom prefix to all generated symbols.
All fixed size types generate C struct types that match the binary format, which can be used directly instead of the accessor functions.
Dictionary keys can be declared sorted: [sorted string] { ... } which causes the generated lookup function to use binary search.
Fields can declare endianness: foo: bigendian int32 which will be automatically decoded when using the generated getters.
Typenames can be declared ahead of time and used like foo: []Foo, or declared inline: foo: [] 'Foo { ... }. If you don’t name the type it will be named based on the fieldname.
All types get generated format functions that are (mostly) compatible with g_variant_print().

推荐订阅源

Alexander Larsson