I am pulling my hair out (and I don't have much left) trying to figure out how to use Rust.
Here is what I am trying to do
let mut map = HashMap::new();
let input = File::open(filename).unwrap();
let reader = BufReader::new(input);
let re = Regex::new(r"(. )\)(. )").unwrap();
for line in reader.lines() {
if let Some(captures) = re.captures(&line.unwrap()) {
let key = captures.get(1).map_or("", |m| m.as_str());
let value = captures.get(2).map_or("", |m| m.as_str());
println!("{} {}", key, value);
map.entry(key).or_insert(Vec::new()).push(value);
}
}
println!("{:?}", map);
I get this compiler error:
error[E0716]: temporary value dropped while borrowed
--> src/main.rs:22:46
|
22 | if let Some(captures) = re.captures(&line.unwrap()) {
| ^^^^^^^^^^^^^ creates a temporary which is freed while still in use
...
30 | }
| - temporary value is freed at the end of this statement
31 | println!("{:?}", map);
| --- borrow later used here
|
= note: consider using a `let` binding to create a longer lived value
It is already looking really ugly and the compiler is unhappy. Any suggestions on how to do this in a more "rusty" way that will work?
CodePudding user response:
I know how you feel... seemingly simple things like this can get complicated in Rust quickly. It's definitely not a succinct language, and that verbosity is tough for me to deal with as well, but it forces you "go through the motions" in order to end up with a provably-correct application.
To break this down, let's start with line
- The
reader
is reading the file line by line, with the happy path creating aString
for line in
takes ownership of saidString
in theline
variableline
falls out of scope at the end of each loop
// line is "owned by" this variable
for line in reader.lines() {
// so no matter what is in here
} // `line` gets delete from memory here
Next up, all of the "to_string
stuff". Much of Rust's speed and memory efficiency is based on "zero cost abstractions", a.k.a "avoiding lots of copies of things". This kind of operation can be really expensive in other languages. In addition to the actual work of checking the pattern against the string, if it matches, in total you would have:
- a
String
for the whole line - another
String
for matching part (capture[0]) - 2 more
String
for each of the capture groups
That's triple the amount of "stuff" in memory, it's not free to allocate it for each loop, and the CPU had to do the work of copying it.
Instead, for many of these types of operations where you see &str
, it's a "slice". Instead of a new copy, it's basically just some indexes. E.g. if you have a string such as:
this is a test
A slice for is a
would actually be something like:
{
referenceToString: <some_memory_address>,
start: 5, // number of characters to the beginning
length: 4,
}
All of the things in the captures
result are based on these. Remember that line
is removed from memory after the loop. Any of those slices referring to <some_memory_address>
would no longer be valid, so you can't put them into your HashMap
. You need to call to_string
to copy the actual text into a new value. This is the cause of the majority of these kinds of errors. In c , you could hold on to such a construct, but the stuff at that memory location would be replaced with something else. Your map would have got_knows_what
in both keys and values, and they would constantly change as new stuff was written to memory.
Next note that line
is a Result<String, Error>
. Any number of things could go wrong in the middle of reading the file. Thus, line
is either the next line as a String
, or an Error
that you need to handle. It's like an "inverted try/catch", if you will. You can "kick the can" with a trailing ?
, but then the current function will have to return a Result<Whatever, Error>
, making it the caller's problem.
Using unwrap
is always a "deal with the devil". If you're truly, absolutely, 100% sure that there can't really be an error there, you can use it, but there could always be an error during I/O, so it's never a safe assumption there. unwrap
bypasses the error handling mechanism of the language, so if you unwrap
an error, the program crashes, effectively saying "I tried to warn ya..."
Finally, as @Pitaj mentioned, reader.lines
creates a new String
for each line. There's some overhead in it having to find empty memory over and over, so you can be more efficient -- particularly for larger files -- with reader.read_line
.
So first, let's make your mess a lot worse, with comments inline
// notice we return a `Result`, because we're not going to deal with errors here
fn get_matches(filename: &str) -> io::Result<HashMap<String, Vec<String>>> {
let re = Regex::new(r"(. )\)(. )").unwrap();
let mut map = HashMap::new();
// `?` to "kick the can" on an error opening the file
let file = File::open(filename)?;
// the reader is `mut` because... well... because the compiler
// told me it should be. ;-)
let mut reader = BufReader::new(file);
// the memory buffer we're going to reuse for each loop
let mut buffer = String::new();
// `read_line` returns the size read, which is 0 when we're finished
// `?` to "kick the can" on an error reading the line
while reader.read_line(&mut buffer)? > 0 {
// if the regex matches
if let Some(captures) = re.captures(&buffer) {
// convenience closure to unpack
let capture = |index| {
// the capture group will always be a `Some` here, so you *could*
// use `unwrap` here. But it would not be if your regex used a
// `?` (optional) capture, so safer to "go through the motions"
if let Some(m) = captures.get(index) {
// the match is a `Match` instance, so convert
// to `&str` with `as_str`
let str_slice = m.as_str();
// we can't keep a `&str` after the loop, so
// create a new `String` with its contents
let string = str_slice.to_string();
// the result is an `Option`
return Some(string)
}
// if the match were `None`, this `capture` function
// wouldn't have a `String` to return
None
};
// if both captures exist, `k` and `v` are `String`
if let (Some(k), Some(v)) = (capture(1), capture(2)) {
// and now `map` owns both of these `String` values
map.entry(k).or_insert(Vec::new()).push(v);
}
}
// empty out the buffer, but keep the memory it allocated
buffer.clear();
}
Ok(map)
}
After all of that, this doesn't seem so bad. ;-)
fn get_matches(filename: &str) -> io::Result<HashMap<String, Vec<String>>> {
let re = Regex::new(r"(. )\)(. )").unwrap();
let mut map = HashMap::new();
let file = File::open(filename)?;
let mut reader = BufReader::new(file);
let mut buffer = String::new();
while reader.read_line(&mut buffer)? > 0 {
if let Some(captures) = re.captures(&buffer) {
let capture = |index| {
captures.get(index).map(|c| c.as_str().to_string())
};
if let (Some(k), Some(v)) = (capture(1), capture(2)) {
map.entry(k).or_insert(Vec::new()).push(v);
}
}
buffer.clear();
}
Ok(map)
}