Home > Blockchain >  Efficient reuse of previous hashmap entries (insert or modify if key exists)
Efficient reuse of previous hashmap entries (insert or modify if key exists)

Time:07-08

I've written a bit of code that creates CSV files, but when an identical file already exists, I'd like to delete the older copy. I decided to do that using a hashmap, but I'm having a problem with the way hashmaps in rust deal with existing vs previous entries. Namely, I'm trying to avoid hashing the key first in order to check if that entry already exists, and then do yet another hash to either retrieve the existing one and modify it or insert a new one.

Rust has a built in method for doing this naturally, but in at least some cases it doesn't work, and here's one of them:

    let cwd = std::env::current_dir().unwrap();
    let mut files = HashMap::with_capacity(5);
    for dir_entry in cwd.read_dir()?.flatten() {
        let fname = dir_entry.file_name();
        let fntext = fname.to_string_lossy();
        let md = dir_entry.metadata()?;
        if md.is_file() && fntext.starts_with("test") && fntext.ends_with(".csv") {
            let mut data = Vec::with_capacity(500_000);
            let f = File::open(dir_entry.path())?;
            let mut br = BufReader::new(f);
            br.read_to_end(&mut data);
            let hash = MeowHasher::hash(data.as_slice());
            files.entry(hash.as_u128()).and_modify(|f: &mut std::fs::DirEntry| {
                let md2 = f.metadata().unwrap();
                if md2.modified().unwrap() > md.modified().unwrap() {
                    std::fs::remove_file(dir_entry.path()).unwrap();
                } else {
                    std::fs::remove_file(f.path()).unwrap();
                    *f = dir_entry;
                }
            }).or_insert(dir_entry);
        }
    }

The problem here is the DirEntry struct doesn't implement clone. In a real world usage, the clone isn't even needed because the move won't even happen unless the entry is already there, and if it is already there, the or_insert clause won't even run. So this code is perfectly sound, nonetheless, this will not compile as is.

I know of several other ways to do what I'm doing successfully, but that isn't the point of this question. The point is to figure out how to do an "insert or modify if key exists" operation on Rust hashmaps when the modify operation involves replacing the existing entry outright, only without needing to clone the replacement in order to satisfy the borrow checker.

Note in this particular case, the dir_entry objects representing files that weren't deleted will need to be reused later, so the solution can't discard them.

CodePudding user response:

Match hash_map::Entry directly. The borrow checker cannot reason about control flow performed via functions, but it can reason on the control flow of match:

match files.entry(hash.as_u128()) {
    hash_map::Entry::Occupied(mut entry) => {
        let f = entry.get_mut();
        let md2 = f.metadata().unwrap();
        if md2.modified().unwrap() > md.modified().unwrap() {
            std::fs::remove_file(dir_entry.path()).unwrap();
        } else {
            std::fs::remove_file(f.path()).unwrap();
            *f = dir_entry;
        }
    }
    hash_map::Entry::Vacant(entry) => {
        entry.insert(dir_entry);
    }
}
  • Related