Rust Book chapter 13.1, wrapping a HashMap with generic parameters-CodePudding

I have successfully implemented the Cacher with a HashMap<u32, u32> and I wanted to do it with fully generic types as per the Book's suggestion. Currently my code looks like this but it doesn't compile and I'm out of ideas. The compilation errors have to do with moving v in the value() method and if I change the whole code to accept &u32 instead of u32 the issue persists.

use std::{thread, time::Duration, collections::HashMap, hash::Hash};

struct Cacher<T, K, V>
where
    T: Fn(K) -> V,
    K: Eq   Hash
{
    calculation: T,
    values: HashMap<K, V>,
}

impl<T, K, V> Cacher<T, K, V>
where
    T: Fn(K) -> V,
    K: Eq   Hash
{
    fn new(calculation: T) -> Self {
        Cacher {
            calculation,
            values: HashMap::new(),
        }
    }

    fn value(&mut self, arg: K) -> V {
        if let Some(v) = self.values.get(&arg) {
            *v
        } else {
            let v = (self.calculation)(arg);
            self.values.insert(arg, v);
            v
        }        
    }
}

// rest of the code
fn main() {
    let simulated_user_specified_value = 10;
    let simulated_random_number = 7;
    generate_workout(simulated_user_specified_value, simulated_random_number);
}

fn generate_workout(intensity: u32, random_number: u32) {
    let mut expensive_closure: Cacher<_, u32, u32> = Cacher::new(|num: u32| {
        println!("calculating slowly...");
        thread::sleep(Duration::from_secs(2));
        num
    });

    if intensity < 25 {
        println!("Today do {} pushups!", expensive_closure.value(intensity));
        println!("Next, do {} situps!", expensive_closure.value(intensity));
    } else {
        if random_number == 3 {
            println!("Take a break today! Remember to stay hydrated!");
        } else {
            println!("Today run for {} minutes!", expensive_closure.value(intensity));
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn call_with_different_values() {
        let mut c = Cacher::new(|a| a);
        let v1 = c.value(1);
        let v2 = c.value(2);

        assert_eq!(v1, 1);
        assert_eq!(v2, 2);
    }
}

Edit: final version of the code with mistakes fixed:

use std::{thread, time::Duration, collections::HashMap, hash::Hash};

struct Cacher<T, K, V>
where
    T: Fn(&K) -> V,
    K: Eq   Hash
{
    calculation: T,
    values: HashMap<K, V>,
}

impl<T, K, V> Cacher<T, K, V>
where
    T: Fn(&K) -> V,
    K: Eq   Hash
{
    fn new(calculation: T) -> Self {
        Cacher {
            calculation,
            values: HashMap::new(),
        }
    }

    fn value(&mut self, arg: K) -> &V {
       self.values.entry(arg).or_insert_with_key(&self.calculation)       
    }
}

// rest of the code
fn main() {
    let simulated_user_specified_value = 10;
    let simulated_random_number = 7;
    generate_workout(simulated_user_specified_value, simulated_random_number);
}

fn generate_workout(intensity: u32, random_number: u32) {
    let mut expensive_closure: Cacher<_, u32, u32> = Cacher::new(|num: &u32| {
        println!("calculating slowly...");
        thread::sleep(Duration::from_secs(2));
        *num
    });

    if intensity < 25 {
        println!("Today do {} pushups!", expensive_closure.value(intensity));
        println!("Next, do {} situps!", expensive_closure.value(intensity));
    } else {
        if random_number == 3 {
            println!("Take a break today! Remember to stay hydrated!");
        } else {
            println!("Today run for {} minutes!", expensive_closure.value(intensity));
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn call_with_different_values() {
        let mut c = Cacher::new(|a| *a);

        let v1 = c.value(1);
        assert_eq!(*v1, 1);

        let v2 = c.value(2);
        assert_eq!(*v2, 2);
    }
}

CodePudding user response：

In Rust, by default, objects are not copyable.

Suppose we want to cache values of type Vec<i32>. Inside Cacher::value(), we take arg: Vec<i32>. Now suppose the key is already in the cache map. We can get a reference to it, but if we want to return type V, like it is now, that is, Vec<i32>, we have to get an owned version of the reference. Now suppose what would happen if your code that does *v to do that would compile. When the caller will destroy (drop) the value, the Vec's storage will be freed, and we will have a freed Vec in the cache! Next time we will look for the same key, we will get an invalid Vec that any operation we will try to perform with will be Undefined Behavior! This is terrible, exactly the kind of bug Rust was designed to prevent.

And similar thing happens with the key. Suppose our argument is of type Vec<i32>. When we will pass it to the calculator at (self.calculation)(arg), it will free it, and when we will try in the next line to store it in the cache (self.values.insert(arg, v)), we'll store an invalid Vec!

There are two ways to tackle this, the correct one depends on the specific details of your problem.

The first way is to just not let them (the consumer and the calculator) to destroy the value. In other words, we will not let them change the value, only inspect it (technically, we can let them change but not destroy, but this will lead to different problems). Or, in Rust jargon, we will give them a reference instead of an owned value. This will have the obvious disadvantage that they will not be able to change it: in some cases they may be required to, and then the second option may be preferable.

Trying to model that in code will give us some errors:

T: Fn(K) -> V,
// Replace with
T: Fn(&K) -> V,

// And
fn value(&mut self, arg: K) -> &V {
    if let Some(v) = self.values.get(&arg) {
        v
    } else {
        let v = (self.calculation)(&arg);
        self.values.insert(arg, v);
        &v
    }        
}

Playground.

error[E0502]: cannot borrow `self.values` as mutable because it is also borrowed as immutable
  --> src/lib.rs:29:13
   |
24 |     fn value(&mut self, arg: K) -> &V {
   |              - let's call the lifetime of this reference `'1`
25 |         if let Some(v) = self.values.get(&arg) {
   |                          --------------------- immutable borrow occurs here
26 |             v
   |             - returning this value requires that `self.values` is borrowed for `'1`
...
29 |             self.values.insert(arg, v);
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here

error[E0515]: cannot return reference to local variable `v`
  --> src/lib.rs:30:13
   |
30 |             &v
   |             ^^ returns a reference to data owned by the current function

error[E0382]: borrow of moved value: `v`
  --> src/lib.rs:30:13
   |
28 |             let v = (self.calculation)(&arg);
   |                 - move occurs because `v` has type `V`, which does not implement the `Copy` trait
29 |             self.values.insert(arg, v);
   |                                     - value moved here
30 |             &v
   |             ^^ value borrowed here after move

This is a bit overwhelming!

The first error is actually a current limitation of the Rust borrow checker: the code is free from the problem the compiler think it has, but the compiler is unable to prove that. Future projects are going to help the compiler figure it, but in the meantime, there is something we can do: Rust's hashmap has a construct designed exactly for this situation of insert-if-not-already-there: the entry API. Using it, our code looks like (playground):

fn value(&mut self, arg: K) -> &V {
    // `&self.calculation` is really equal here to `|arg| (self.calculation)(arg)`, just shorter.
    self.values.entry(arg).or_insert_with_key(&self.calculation)
}

And magically, we also got rid of the two other problems! That is good, but I'll still explain what was the problem for knowledge sake.

We moved v into the hashmap, then used it. This is the problem. Consider again the situation where v has the type Vec<i32>. If we can use it, we need to destroy it after - but it is inside the map, we cannot destory it! For this reason, the compiler rejected our code with the third error.

As for the second error, we return a reference to v. But v is a local variable, and is destroyed at the end of the function - so we would returning a reference to invalid data!

Fixing them is non-trivial (or even impossible), and so I prefer to not explain it here.

The second way to solve is by first thinking "why it worked with the i32, non-generic version"?

The answer is simple: it worked because no harm can be caused by "destroying" an i32. More formally, i32 isn't holding some resource, like Vec that holds an allocation: it is just plain data, and copying it - just a bitwise, simple copy - is always fine. Or, in Rust jargon, i32 implements Copy.

This raises the question: can we express this rule in the generic version? i.e. say "we want to only allow types that are always valid to copy, i.e. implement Copy?" And it turns out the answer is, yes! We just need to add a where Type: Copy bound (playground):

where
    T: Fn(K) -> V,
    K: Eq   Hash   Copy,
    V: Copy,

There is a way to generalize this: currently, this allows only bitwise copy - that is, only types where the exact same bit pattern can be copied freely. But we can also allow custom-copy code, and that allow things like Vec, that while holding a resource, can be copied by allocating new memory and duplicating the contents. The trait is called Clone, but it requires us to be a little more explicit when we want to copy the value (note that every Copy is also Clone, so it will also work with types like i32):

where
    T: Fn(K) -> V,
    K: Eq   Hash   Clone,
    V: Clone,

fn value(&mut self, arg: K) -> V {
    if let Some(v) = self.values.get(&arg) {
        v.clone()
    } else {
        let v = (self.calculation)(arg.clone());
        self.values.insert(arg, v.clone());
        v
    }
}

Playground.

If you are in doubt what option to choose, or you don't know what the clients of your code need, choose the first option. It is more general, because it allows any type (not just those that implements Clone), and if they need it, they can always .clone() the value then mutate it, even if it is a little inconvenient.