GSoC Report - Part 2

Edited: 30/08/17. Welcome to Part 2 of my GSoC final report! In this post I outline the approach Philip (mentor) Chimento and I decided on for introducing better and safer memory management in to GJS. In my last post I outlined why the piecemeal conversion to Rust was not the right approach to trying to make GJS more memory safe, it's not required reading for this part however.

Rust has many strengths, but the ones I will focus on are:

  • concrete data ownership model revolving around the very well named "borrows", which are much like pointers, but with ownership enforced by Rust - a borrow cannot be held after the data is freed.
  • automatic drops (free) of data once it goes out of scope.
  • moves:
    • basic types are copy by default - by the time you've copied the address to data, you may as well have copied the basic type. A pointer (address) in Rust is 32 bits, and most basic types are 32 bits or less (except for i62/u62/f64 etc)
    • non-basic types are move by default unless they implement copy traits. This means that rather than copy the data, Rust just copies the address to the type, leaving the previous var the address was assigned to as null.
  • Rust allocations are immutable by default; to pass mutable variables to functions, or modify them, vars need to be declared as mutable.

Does C++ offer anything like this?

unique_ptr in C++

Prior to this project, the last time I really seriously used C++ was nearly a decade ago, and that was mostly C++98. And more recently with a hardware paper I did last year that required C++, but was limited to a very basic subset. So I wasn't aware that C++ had smart pointers beyond auto_ptr (since depreciated) and had not had a need to delve further.

unique_ptr is one of three smart pointers, which themselves are class objects. This one owns another object through an internal pointer, and disposes of this object when the unique_ptr is destroyed. The other two are shared_ptr and weak_ptr which I'll ignore. It functions very much like the strengths of Rust I outlined in the introduction:

  • it owns the data it points to
  • automatic free of pointed to data when destroyed
  • non-copy by default
    • ownership can be transferred to another unique_ptr
      • if the other was non-empty, it frees its data
    • can only be transferred by value, meaning it is moved

Great! So how do we use it? Well, GJS is already using it, in the form of GjsAutoChar and GJSAutoJSChar - though the use of these is currently very limited. The first is simpler, so I'll focus on that one;

// a wrapper class for unique_ptr and a custom deleter
class GjsAutoChar : public std::unique_ptr<char, decltype(&g_free)> {
public:
    GjsAutoChar(char *str = nullptr) : unique_ptr(str, g_free) {}

    operator const char *() {
        return get();
    }

    // takes full ownership of the string if non-const
    void operator= (char *str) {
        reset(str);
    }

    // copies the string if it is const
    void operator= (const char *str) {
        reset(g_strdup(str));
    }
};

It's a nice clean wrapper class around unique_ptr, and if no initialization args are given it is initialized with a nullptr. It also does something else; provides a custom deleter, g_free. Typically everything with Gnome and as such, in GJS uses GLib functions and types - so when they need to be freed, they should be freed using the GLib functions for doing so, such as g_free. This will ensure they are freed correctly and safely, with the same memory allocator that they were created with.

GjsAutoChar is a rather nice way to initialize smart pointers that would otherwise look like the first line here in the comparison example;

    // std initialization
    std::unique_ptr<Foo, decltype(&g_free)> foo = std::unique_ptr("stringstuff", g_free());
    // versus
    GjsAutoChar bar; // init as nullptr
    bar = GjsAutoChar("fizz");
    // or
    GjsAutoChar life = g_strdup("42"); //preferred
    // or
    GjsAutoChar life(g_strdup("42"));

GJS uses a fair amount of Gnome library functions which return pointers to allocated objects, or takes pointers as an out parameter. Receiving a pointer is great, we can just stick this in the unique_ptr along with the correct function to free it, and we've got a nifty bit of safety - when the smart pointer object is destroyed, it takes down the data it contains with it via the deleter it was handed.

The second essential to change for additional safety is to convert functions in GJS to take references instead of pointers. In some cases where those references don't need to be mutable, take const references instead.

With all that in mind, the goal shifted to trying to make GJS much more memory safe via use of smart pointers and where possible, using references to show ownership.

The Process

I quickly found that the best (and only) way to proceed was by first shifting all internal variables of a function over to a unique_ptr wrapper. The reason being that this then makes it easier to change the function signatures if everything being passed to them already matches.

But wait! Can you convert the inner variables to smart pointers and pass them to functions that still take pointers? Yes, the smart pointer object for all intents and purposes works like a regular pointer, but with ownership and other restrictions. The other reason it's good to do the inners first is that the code will still compile and run as you go - but if you do the function signatures first then you've got a very deep rabbit hole to follow for everything using that function, and converting any variables being passed.

An example;

static char *
strip_uri_scheme(const char *potential_uri)
{
    char *uri_header = g_uri_parse_scheme(potential_uri);

    if (uri_header) {
        gsize offset = strlen(uri_header);
        g_free(uri_header);
        return g_strdup(potential_uri + offset + 4);
    }

    return NULL;
}

converted to;

GjsAutoChar
strip_uri_scheme(const GjsAutoChar &potential_uri)
{
    GjsAutoChar uri_header = g_uri_parse_scheme(potential_uri.get());

    if (uri_header) {
        gsize offset = strlen(uri_header);
        // uri_header is freed on return
        return GjsAutoChar(g_strdup(potential_uri.get() + offset + 4));
    }

    return nullptr; // uri_header is also freed here on return
}

Nice and safe. The caller of this function owns the GjsAutoChar passed to it, and will free it when it goes out of scope. The GjsAutoChar created in the body of this function also frees itself once it goes out of scope and the destructor is called. Notice also that potential_uri.get() is required to get the inner pointer. Lastly, this function returns a GjsAutoChar, the caller will own the returned pointer, and will free it when it goes out of scope.

There are also some caveats with using smart pointers:

  • to hand over ownership of the pointer to anything you will need to use some of the provided class functions such as;

    // full_path is a GjsAutoChar // hand over ownership to directories g_ptr_array_add(directories, full_path.release());

  • assigning one smart pointer to another - if the receiving pointer was holding something beforehand, then this will be deleted before accepting the new value;

    GjsAutoChar foo(strdup(name1)); GjsAutoChar bar(strdup(name2));

    foo = bar; // foo deletes the value returned from strdup // bar now contains a nullptr

  • you can use the .reset() class function to delete the held value early, and this can also take a new raw pointer as a parameter (old value is deleted as well).

    GjsAutoChar foo(strdup(name1)); foo.reset(); // now contains a nullptr foo.reset(strdup(name3)); // takes ownership of the returned value foo.reset(strdup(name4)); // frees strdup(name3) and takes new ownership

  • there is an implicit bool conversion so such comparisons as == nullptr are not needed - just use if (!this_pointer) { } or if (this_pointer) { }

Another example, this time using the GjsAutoError which uses g_error_free as the deleter;

static void
copy_source_file_to_coverage_output(GjsAutoUnref<GFile> &source_file,
                                    GjsAutoUnref<GFile> &destination_file)
{
     GError *p_error = nullptr;
     GjsAutoError error;

    /* We need to recursively make the directory we
     * want to copy to, as g_file_copy doesn't do that */
    GjsAutoUnref<GFile> destination_dir = g_file_get_parent(destination_file);
    if (!g_file_make_directory_with_parents(destination_dir, NULL, &p_error)) {
        error.reset(p_error);
        if (!g_error_matches(error, G_IO_ERROR, G_IO_ERROR_EXISTS))
            goto fail;
    }

    if (!g_file_copy(source_file,
                     destination_file,
                     G_FILE_COPY_OVERWRITE,
                     NULL,
                     NULL,
                     NULL,
                     &p_error)) {
        error.reset(p_error);
        goto fail;
    }

    return;

fail:
    GjsAutoChar source_uri = get_file_identifier(source_file);
    GjsAutoChar dest_uri = get_file_identifier(destination_file);
    g_critical("Failed to copy source file %s to destination %s: %s\n",
               source_uri.get(), dest_uri.get(), error->message);
}

The two g_ functions called here require a pointer for the out parameters. I'd originally written this with much more boiler-plate than needed until Philip called me on it and provided some pointers (ha!). It looked a little more like this to start with;

GjsAutoError error = nullptr;
/* We need to recursively make the directory we
 * want to copy to, as g_file_copy doesn't do that */
GError *p_error = error.release();
GjsAutoUnref<GFile> destination_dir = g_file_get_parent(destination_file);
if (!g_file_make_directory_with_parents(destination_dir, NULL, &p_error)) {
    error.reset(p_error);
    if (!g_error_matches(error, G_IO_ERROR, G_IO_ERROR_EXISTS))
        goto fail;
}

Of course since g_file_make_directory_with_parents took a pointer to start with, it may as well have been a plain pointer - but, right after that pointer is used, claim it! In longer function bodies this becomes really valuable; you don't need to worry about missing a free/delete, and the owner is very visible right where it needs to be. One more (concise) example of this;

GError *p_error = nullptr;
utf8 = g_locale_to_utf8(src, -1, NULL, NULL, &p_error);
// take ownership here, we now don't need to worry about a missing g_error_free
GjsAutoError error(p_error);

So that's it, really. Now it's just a matter of putting in the time to chase the threads, untangle the webs, fall down rabbit warrens, and shave a few yaks. The simple stuff is simple, quick and easy and painless - but as I found when changing a few function sigs in coverage.cpp, you can fall down some fairly deep holes and have to chase through other files (test/gjs-test-coverage.cpp in this case).

I am preparing to start submitting patches soon - mainly I'm being held back a little by university being back in action, and needing to sort out a process to get things in order. Hopefully this coming weekend. I will have things reorganized enough to make my life easier.

In closing

This was an extremely rewarding project for me, and I will be forever thankful to Philip Chemento for being willing mentor me and put up with my many hundreds of questions. I've made a few new friends along the way, and really gotten to love the Gnome community.

Bring on the code!

My GSoC Summary is available here: permanent link.