Converting a text file to a hashmap dictionary-CodePudding

I already posted a problem linked to this code, but it appears to be a different problem. I will try to describe the problem as clearly as possible.

Goal that I want to achieve

Create a hashmap with contents from a text file (in this case /proc/meminfo), similar to this format:

{   
"MemTotal":       "16256760 kB",
"MemFree":          "462276 kB",
"MemAvailable":   "10108672 kB",
"Buffers":         "1432356 kB",
"Cached":          "7138456 kB",
... etc
}

The raw output from the text file looks like this:

cat /proc/meminfo
MemTotal:       16256760 kB
MemFree:         2685512 kB
MemAvailable:   11105752 kB
Buffers:          774076 kB
Cached:          7722948 kB
... etc

The number of lines should be depending on how many lines a file contains.

Current situation

The code I currently have will create a hashmap with:

key: "entryx".
value: a line from the text file.

Code:

fn read_file_to_hashmap(path: impl AsRef<Path>) -> HashMap<String, String> {
let reader = BufReader::new(File::open(path).expect("error here"));

let hashmap = [
    "Entry1".to_string(),
    "Entry2".to_string(),
    "Entry3".to_string(),
    "Entry4".to_string(),
    "Entry5".to_string(),
]
.into_iter()
.zip(reader.lines().map(Result::unwrap))
.collect::<HashMap<_, _>>();
return hashmap;
}

println! output:

{
"Entry1": "MemTotal:       16256760 kB",
"Entry5": "Cached:          7138456 kB",
"Entry2": "MemFree:          462276 kB",
"Entry4": "Buffers:         1432356 kB",
"Entry3": "MemAvailable:   10108672 kB",
}

How can I change the code to achieve the desired goal?

CodePudding user response：

While there are many ways to solve this problem, I think the easiest in this case would be a regular expression.

In your case, this regular expression should perform what you want:

^(\S ):\s (. )$

^ matches the beginning of the line (if multi_line is enabled)
(\S ) matches one or more non-whitespace characters
:\s matches : followed by at least one whitespace character
(. ) matches everything left over
$ matches the end of the line (if multi_line is enabled)

Written in a program, this is how it looks:

use std::collections::HashMap;

use regex::RegexBuilder;

fn main() {
    let meminfo_str = std::fs::read_to_string("/proc/meminfo").unwrap();

    let re = RegexBuilder::new(r"^(\S ):\s (. )$")
        .multi_line(true)
        .build()
        .unwrap();

    let meminfo = re
        .captures_iter(&meminfo_str)
        .map(|cap| {
            (
                cap.get(1).unwrap().as_str().to_string(),
                cap.get(2).unwrap().as_str().to_string(),
            )
        })
        .collect::<HashMap<_, _>>();

    println!("{:#?}", meminfo);
}

{
    "SwapTotal": "3145728 kB",
    "Slab": "133492 kB",
    "SwapFree": "3145728 kB",
    "Writeback": "0 kB",
    "Mapped": "120212 kB",
    "SReclaimable": "99628 kB",
    "KernelStack": "7616 kB",
    "WritebackTmp": "0 kB",
    "VmallocChunk": "0 kB",
    "AnonHugePages": "1914880 kB",
    "Active(file)": "1187412 kB",
    "Mlocked": "0 kB",
    "ShmemHugePages": "0 kB",
    "FileHugePages": "0 kB",
    "MemTotal": "10129324 kB",
    "Buffers": "143812 kB",
    "HugePages_Total": "0",
    "HugePages_Surp": "0",
    "DirectMap4k": "68608 kB",
    "Inactive(file)": "1245252 kB",
    "Inactive": "3928204 kB",
    "CommitLimit": "8210388 kB",
    "Unevictable": "0 kB",
    "AnonPages": "2637672 kB",
    "Active(anon)": "448 kB",
    "SUnreclaim": "33864 kB",
    "MemAvailable": "6891192 kB",
    "KReclaimable": "99628 kB",
    "Committed_AS": "2777120 kB",
    "Hugepagesize": "2048 kB",
    "VmallocUsed": "27996 kB",
    "SwapCached": "0 kB",
    "Active": "1187860 kB",
    "HugePages_Rsvd": "0",
    "ShmemPmdMapped": "0 kB",
    "FilePmdMapped": "0 kB",
    "Cached": "2306060 kB",
    "VmallocTotal": "34359738367 kB",
    "DirectMap2M": "7206912 kB",
    "Hugetlb": "0 kB",
    "Inactive(anon)": "2682952 kB",
    "Dirty": "19460 kB",
    "PageTables": "24792 kB",
    "Bounce": "0 kB",
    "Percpu": "1392 kB",
    "Shmem": "17328 kB",
    "MemFree": "4633364 kB",
    "HugePages_Free": "0",
    "DirectMap1G": "12582912 kB",
    "NFS_Unstable": "0 kB",
}

If you want to be more robust and parse correctly even if there might be whitespace at the end or beginning of a line, you could use this regex instead:

^\s*(\S ):\s (\S. \S)\s*$