Shooting Your Foot in Rust

I've had a bit of difficulty getting this post done in a decent timeframe as I have 4 papers on the go this semester, one of which I was enrolled for 2.5 weeks late and had to scramble to catch up on - there were some other things I wanted to discuss here but time constraints are pushing those to the next post. Never-the-less, onwards.

Before I am able to use Rust in GJS effectively, I've needed to create FFI bindings for Rust to use to call in to C libraries, such as GLib, GIRepository, and libffi; doing this required the use of both bindgen and gtk-rs/gir.

In both cases, these tools produce unsafe Rust, this is code that does one of;

  1. Dereferencing a raw pointer
  2. Calling an unsafe function or method
  3. Accessing or modifying a mutable static variable
  4. Implementing an unsafe trait For the bindings I am using, the unsafety generally comes from the use of points 1, 2, and 3. And example of this is from the [libffi] bindings (generated using bindgen):
extern "C" {
    pub fn ffi_prep_cif(cif: *mut ffi_cif, abi: ffi_abi,
                        nargs: ::std::os::raw::c_uint, rtype: *mut ffi_type,
                        atypes: *mut *mut ffi_type) -> ffi_status;
}

extern "C" is a marker that tells Rust that the function is called externally - as such it is regarded as an unsafe function - and the function prototype follows. The function takes a variety of arguments; cif: *mut ffi_cif is a raw pointer to a ffi_cif which has the layout of:

#[repr(C)]
#[derive(Debug, Copy)]
pub struct ffi_cif {
    pub abi: ffi_abi,
    pub nargs: ::std::os::raw::c_uint,
    pub arg_types: *mut *mut ffi_type,
    pub rtype: *mut ffi_type,
    pub bytes: ::std::os::raw::c_uint,
    pub flags: ::std::os::raw::c_uint,
}

#[repr(C)] marks this struct definition as having the order, size, and alignment of the same definition in C. This is important for anything being passed over the FFI boundary between Rust & C. There are some small restrictions here, Rust tuples and tagged unions (enum) don't exist in C, and should not ever be passed over the FFI, and drop flags need to be added (drop flags are what Rust uses to free memory).

Raw pointers in Rust are safe, you can copy, move, create, and borrow raw pointers. But when you dereference one, it is classed as unsafe. Why? Rust can't guarantee that the data pointed to is actually valid data - this is a task for the implementor.

You will see ::std::os::raw::c_uint and other variants of c_ types popping up, these are basic data types which are guaranteed to have the same size as its C counterpart.

The other oddity here is pub arg_types: *mut *mut ffi_type. arg_types is a Vector, and the *mut *mut is a mutable raw pointer to the first element in the Vector, which is also a mutable pointer (to an ffi_type). That is, a pointer to an vector of pointers.

If you're curious, ffi_type is defined as;

#[repr(C)]
#[derive(Debug, Copy)]
pub struct _ffi_type {
    pub size: usize,
    pub alignment: ::std::os::raw::c_ushort,
    pub type_: ::std::os::raw::c_ushort,
    pub elements: *mut *mut _ffi_type,
}
pub type ffi_type = _ffi_type;

Making unsafe Safe

Rust is supposed to be a safe language, right? It still is, even when you use unsafe code. Using unsafe code doesn't disable all the safety checks, it only enables the use of some extra features which are unsafe (see points 1-4). The caveat is that the unsafe code must be contained within an unsafe { } block to enable these features, and it is up to the programmer to validate these blocks and make sure they actually are safe. If you do end up with problems, eg, leaked memory then you can be sure that the problem lies within an unsafe block.

An example is from my gi-girffi wrapper (this is a translation of the functions in girepository/girffi.c);

pub fn g_callable_info_get_ffi_return_type(callable_info: &mut GICallableInfo) -> Option<ffi_type> {
    let mut return_type;
    unsafe {
        return_type = g_callable_info_get_return_type(callable_info)
            .as_mut() // make the raw pointer a mutable reference
            .unwrap_or(return None);
    }
    Some(g_type_info_get_ffi_type(return_type))
}

You will see that g_callable_info_get_return_type() is the only function within an unsafe block; it is a GLib function which has a signature definition of;

pub fn g_callable_info_get_return_type(info: *mut GICallableInfo) -> *mut GITypeInfo;

it takes a raw mutable pointer to a GICallableInfo (which is a type alias for GIBaseInfo), and returns a raw mutable pointer to a newly allocated GITypeInfo. Since this is a new allocation of memory, and the only reference to it is this pointer, I convert it to a Rust mutable reference using as_mut(). This conversion also checks if the pointer is null, and returns None if so - it doesn't check that the data is valid, however...

return_type is passed on to g_type_info_get_ffi_type(), which I've written in similar fashion as a safe function and takes ownership of it (moved value), and drops it once done with it, we then return the result of that call wrapped in an <Option>.

Regarding g_type_info_get_ffi_type: within this function I've added a manual call to drop return_type via g_free. At some point int he near future I may wrap some things like this with a manually defined Drop trait so that the data is correctly freed when dropped.

But that's a lot of unsafety

Yes, it is. But there are several aspects to all this:

  • we want to restrict all unsafe features/functions/operations to be within unsafe blocks - this means that if there are issues anywhere, then we have a good idea of where to start
  • we want to create a safe API over these unsafe aspects so that we can guarantee that the use of this API is safe throughout safe Rust use
  • and we want to ensure that contracts between unsafe and safe are fulfilled so that safe Rust continues to be safe - for example wrapping a binding in a safe function which guarantees that the contract to the unsafe function is filled.

Having said all that, can you shoot your foot in Rust? Absolutely! That is why unsafe code is boxed in with unsafe keywords. If something is hinky, then you know where to start looking. It is up to the implementor to honour the contracts with Rust when producing a safe function that contains unsafe code.

Pain Points

Unions

Note: I'm referencing stable rustc here

The biggest headache I've had so far is purely with deciding how to represent a C union in Rust. It looks like bindgen produces rust code for unions by using a Rust struct and ::std::marker::PhantomData<T>. PhantomData is is used as a marker of sorts in many instances, and in this case it is used to indicate ownership of this data. I don't really understand it very well at this point, more info is available here and here.

~~Untagged Union support is coming to stable rustc soon, and is available in nightly rust (RFC). I hope it lands in rustc 1.19 so that I can incorporate it in to the gir-ffi bindings, and the gtk-rs project can add support to the gir->rust binding producer.~~ I worked on enabling proper Union bindings in gir and the PR was accepted - ~it is only usable in nightly Rust as of yet, and is behind a feature-gate~.

Update: I am actively working to move the gir union support to default since untagged unions are now stable in Rust as of 1.19

Bit-fields

Bit-fields feature in a few structs within the GNOME libs, and likely in a fair few other libraries. In particular there are a few structs that have mixed data, and these are the ones which don't really have an ideal solution yet. An example of this is;

struct _GHookList
{
  gulong	    seq_id;
  guint		    hook_size : 16;
  guint		    is_setup : 1;
  GHook		   *hooks;
  gpointer	    dummy3;
  GHookFinalizeFunc finalize_hook;
  gpointer	    dummy[2];
};

where hook_size and is_setup are the bit-fields and as such they change the size and alignment of the struct. For now the gir to Rust binding gen is replacing the first instance of a bit-field with _truncated_record_marker: c_void and commenting out the following fields. The programmer is expected to ackowledge which structs are truncated and write their code accordingly. A bit of extra work.

We now need a Rust RFC to be finished off to get C type bit-fields in to Rust (I will be taking this on as soon as I get the time).

Next post?

My next post will be about what I've learned from this project, and may span a few posts as I try and clarify things for myself enough to write about them. In particular I want to highlight the pros/cons of this project, and I want to try and translate what I've learned in Rust back to the C++ codebase - this means analysing use of pointers, switching to unique_ptr and the ownership model it presents, references instead of pointers, and a few other things.

So far this has been an incredibly rewarding project for myself, and I really hope I can share this knowledge in a way that is adequate for others to follow.