From C++ to the borrow checker

 2023-03-12

A common complaint about Rust is that its learning curve is pretty steep. The borrow checker will give you quite the fight until you get used to writing code in a way that satisfies its expectations.

In my case, I started learning Rust after a few years of doing C++. This was an interesting experience because I very quickly realized the constraints that the borrow checker was applying seemed familiar. Almost too familiar. I always had rules in my mind every time I wrote C++ code that now the borrow checker was enforcing like:

This function returns a reference to an object’s member so I need to make sure I don’t modify the object in a way that invalidates that reference.
I’m storing this reference somewhere so I need to make sure the thing it points to is alive for the duration of it.

If you’re using modern C++, it’s easy to avoid violating these rules in some contexts by using utilities like unique_ptr and shared_ptr. Unfortunately, given references are a fundamental part of the C++ type system and smart pointers are no replacement for them, it’s very easy to break them.

Let’s look at an example of how these rules come into play:

class Cache {
public:
  // Set a key to a value.
  void set(int key, string value) {
    mappings.insert_or_assign(key, move(value));
  }

  // Delete the value for a key.
  void remove(int key) {
    mappings.erase(key);
  }

  // Get the value for a key.
  optional<reference_wrapper<const string>> get(int key) const {
    if (auto iter = mappings.find(key); iter != mappings.end()) {
      return make_optional(cref(iter->second));
    } else {
      return nullopt;
    }
  }

private:
  unordered_map<int, string> mappings;
};

This should be pretty self-explanatory; keep in mind Cache::get returns a reference to the value. The implicit problem with this code, a problem that you’re aware of every time you write C++, is that you always need to be mindful of what you do while you hold a reference returned by Cache::get. You simply cannot perform any operation on the map that could invalidate it.

For example, the following code will cause Undefined Behavior (UB):

void foo(Cache& cache) {
  auto value = cache.get(42);
  // Make sure we found it.
  if (!value) {
    return;
  }

  // ...
  cache.remove(42);

  // oops
  cout << value->get() << endl;
}

This shouldn’t come as a surprise if you’ve used C or C++ before: you know this and you (try to) avoid it. Rust, on the other hand, will enforce this invariant: the borrow checker ensures that you can’t modify a value if a reference to it already exists:

// Same thing as above.
struct Cache {
  mappings: HashMap<i32, String>,
}

impl Cache {
  fn set(&mut self, key: i32, value: String) {
    self.mappings.insert(key, value);
  }

  fn remove(&mut self, key: i32) {
    self.mappings.remove(&key);
  }

  fn get(&self, key: i32) -> Option<&String> {
    self.mappings.get(&key)
  }
}

fn foo(cache: &mut Cache) {
  let value = cache.get(42);

  // Nope! `cache` is borrowed.
  cache.remove(42);

  // Also nope!
  cache.set(42, "my new value".to_string());

  // We're still using it down here!
  println!("The answer is {value:?}")
}

This isn’t doing anything new: we knew we couldn’t do that, there just wasn’t anything enforcing it. While this is now preventing us from shooting ourselves in the foot, it also has some other side effects:

fn foo(cache: &mut Cache) {
  let value = cache.get(42);

  // Also doesn't work, argh!
  cache.set(45, "another value".to_string());
}

In this example, we’re trying to write an entirely different key but the compiler still complains. Depending on the internals of the Cache (or more specifically, the HashMap type), you may technically be able to do that but the compiler is preventing it. This, however, is a good thing. How many times have you been writing C++ and you happen to want to do something with references over a standard container and you have to go to cppreference and read the fine print for it? Here are some of them:

vector::push_back may invalidate references/pointers and iterators if there’s a reallocation.
deque::push_back invalidates iterators but not references/pointers.
unordered_map::insert invalidates iterators but not references/pointers.
map::insert does not invalidate references/pointers or iterators.
When overwriting an existing key, both unordered_map::insert and map::insert will invalidate references/pointers to the previous version of that key.

This puts a lot of burden on the programmer: we need to carefully choose the container we use and the read/write patterns we use to access it to avoid introducing issues. The Cache type here is a thin wrapper over a map, so these are not contrived examples but instead the type of thing you find on a normal piece of code.

Rust’s borrow checker enforces these rules automatically for you and prevents you from messing up. At the same time, it also takes away some flexibility when writing code as you need to organize it in a way that satisfies it. However, I personally don’t really mind this and prefer knowing that if my program compiles, then my use of references is well formed.

Using this to our advantage

Besides the fact that the borrow checker prevents a whole lot of problems, it also lets us write optimizations that would otherwise be pretty scary in C++. Let’s say we had a list of persons and we wanted to build an index on top of them temporarily just to be used in some particular function. We don’t want to create a copy of every person just for the sake of calling that function, so instead we use pointers:

struct Person {
  int id;
  string name;
};

// Builds an index where the value is a pointer
// to the existing person, thus avoiding copies.
unordered_map<int, const Person*> index_persons(const vector<Person>& persons) {
  // iterate and build the map, keying by id...
}

void foo(const vector<Person>& persons) {
  const auto index = index_persons(persons);
  do_indexed_stuff(index);
}

It’s perfectly fine to use the created index as long as you never modify the “persons” vector while it’s in use. This is scary, as breaking that rule will introduce UB. If you did this in Rust, however, you’d get the same advantage (reusing the Persons in the index), but with the guarantee that you’re not breaking it:

fn index_persons(persons: &[Person]) -> HashMap<i32, &Person> {
  // same thing...
}

We’re writing pretty much the same code that in C++ could blow in our faces, except safely by using the borrow checker to our advantage. This use of references in Rust is satisfying: you can avoid copying your types around and instead use references as long as the compiler validates no mutations will invalidate them.

Reference/pointer semantics

Now imagine if, in the example above, the id was a string. Changing the returned type in the C++ version to have a const string* as a key would not work by default as it would prevent you from doing lookups: you can only lookup based on pointer equality (e.g. a pointer to that string):

unordered_map<const string*, int> mappings;
const string key = "foo";
mappings.emplace(&key, 42);

const string other_key = "foo";
mappings.count(&other_key); // 0, it's a different pointer!

For this to work you’d need to create a custom hasher and equality comparator types that hash and compare the thing behind the pointer rather than the pointer itself. This is not terrible but it’s more work, and it’s easy to miss. The same thing goes for reference_wrapper which doesn’t implement std::hash and operator== even though it would probably make a lot of sense for it to do so.

On the other hand, in Rust, you can safely and trivially create a HashMap<&String, &Person> without any caveats. This builds on top of the safety guarantees the borrow checker gives us and it lets us create the type of performance improvements that would be scary/error-prone in C++ without any additional work.

Conclusion

While Rust’s borrow checker takes some time to get used to, I think people with a C++ background have a strong foundation that will make the transition a lot smoother than people coming from other languages. That is not to say you’ll hit the ground running, but I feel like the concepts and constraints you need to get used to are easier to grasp given you already have them in your mind when you write C++. Moreover, the borrow checker lets us perform the same type of optimizations we would write in C++ but without the caveats and the feeling of danger.