Home > Back-end >  Storing different types in a single Hashmap
Storing different types in a single Hashmap

Time:02-13

A bit of context to prevent xy:

I want to build a cache for my system. There are Type pairs consisting of IdType and EndpointType in a 1:1 relationship. Every IdType only ever refers to one EndpointType and vice versa. The Id trait (see code below) is not required, it only exists in my current iteration of trying to make this work.

There are way too many different types of endpoints - and usually an application will only use a small subset of them - to justify statically building a cache per endpoint and holding that in memory. Additionally, I want to be able to add new endpoints without touching any code related to caching or the client itself.

The Endpoint trait objects are not object safe and there is no way to make them object safe due to associated consts, Sized and methods not using self to make use of compile time optimizations.

I came up with the idea of storing them as kind of Any type. Now I like type safety, so I tried to restrict it a bit more. After failing to find a satisfying solution, I still have several issues with my idea:

  1. How do I make this approach work?
  2. Is there a better solution?
  3. Is this sound? Why?
  4. Is there a way to make it sound?
  5. Can I achieve this without unsafe?

Link to the rust playground

use std::sync::Mutex;
use std::collections::HashMap;

pub trait Endpoint: Sized {
    type Id;
}

pub trait Id {
    type Endpoint;
}

pub struct Client {
    cache: Mutex<Cache>,
}

impl Client {
    fn get<T: Endpoint>(&self, id: T::Id) -> T {
        if let Some(result) = self.cache.lock().unwrap().get(id) {
            return result;
        }
        todo!()
    }
}

pub struct Cache {
    map: HashMap<Box<dyn Id>, Box<dyn Endpoint>>,
}

impl Cache {
    fn get<T: Id>(&self, id: T) -> Option<T::Endpoint> {
        if let Some(endpoint) = self.map.get(Box::new(id)) {
            let endpoint: Box<T::Endpoint> = unsafe { std::mem::transmute(endpoint) };
            Some(endpoint)
        } else {
            None
        }
    }
}

CodePudding user response:

I'd strongly recommend going with Box<dyn Any> over std::mem::transmute. A fairly common pattern is to have a HashMap<TypeId, Box<dyn Any>>. TypeId is a type that can be used to distinguish between other types, and it's Eq and Hash impls work how you'd expect: if the types are the same, the type ids are equal. So you could have something roughly like:

struct Cache {
  map: HashMap<TypeId, Box<dyn Any>>,
}

impl Cache {
  fn get<T>(&self) -> Option<&T> {
    let type_id = TypeId::of::<T>();
    let any = self.map.get(&type_id)?;
    any.downcast_ref()
  }
}

This can act like a rough lookup table that associates a type with a single value of that type. If there is a Box<dyn Any> that is associated with the type id for T, but points to a value of some other type, you won't get UB, you'll just get None.

This sort of primitive can be used to build a more complex cache around it. For example, you could have a struct that wraps this that provides access, but it only supports keys of type (T, <T as Id>::Endpoint).

The key point is to use Any and downcast_* methods to avoid unsafe.

std::mem::transmute is one of the more dangerous unsafe functions, and should largely be seen as a last resort. From the docs:

transmute is incredibly unsafe. There are a vast number of ways to cause undefined behaviour with this function. transmute should be the absolute last resort

I'd also add, this pattern is common enough that there's probably a crate that provides a type safe interface, a quick search yielded this: https://crates.io/crates/anymap, though I can't speak for this crate's quality in particular

Edit:

If you also want to be able to distinguish between multiple instances of the same Id/Endpoint pair type, you could modify it to store a hashmap with the keys having a type (TypeId, u64), where the u64 is the result of hashing the original key (a bit like a SQL composite key):

struct Cache {
  map: HashMap<(TypeId, u64), Box<dyn Any>>,
}

impl Cache {
    fn insert<T>(&mut self, id: T::Id, endpoint: T)
    where
        T: Endpoint   'static,
        T::Id: Hash,
    {
        let type_id = TypeId::of::<T>();
        let hash = {
            let mut hasher = DefaultHasher::new();
            id.hash(&mut hasher);
            hasher.finish()
        };

        self.map.insert((type_id, hash), Box::new(endpoint));
    }

    fn get<T>(&self, id: T::Id) -> Option<&T>
    where
        T: Endpoint   'static,
        T::Id: Hash,
    {
        let type_id = ...;
        let hash = ...;

        let option_any = self.map.get(&(type_id, hash));
        option_any.and_then(|any| any.downcast_ref())
    }
}

This lets you have multiple FooEndpoints (with FooIds), as well as BarEndpoints (and BarIds), while avoiding transmute/unsafe, all with a single map lookup. Hopefully I read your question more accurately this time ;p

P.S. I have no idea if this particular way of obtaining a u64 hash is "good", I've never actually had to do this before (I've only ever used the Hash trait with std::collections::HashMap/HashSet). Might be worth doing some googling to make sure this isn't doing something horrible

  • Related