Home > Mobile >  Parallelising file processing using rayon
Parallelising file processing using rayon

Time:04-10

I've got 7 CSV files (55 MB each) in my local source folder which I want to convert into JSON format and store into local folder. My OS is MacOS (Quad-Core Intel i5). Basically, it is a simple Rust program which is run from a console as

./target/release/convert <source-folder> <target-folder>

My multithreading approach using Rust threads is ass following

fn main() -> Result<()> {
    let source_dir = PathBuf::from(get_first_arg()?);
    let target_dir = PathBuf::from(get_second_arg()?);

    let paths = get_file_paths(&source_dir)?;

    let mut handles = vec![];
    for source_path in paths {
        let target_path = create_target_file_path(&source_path, &target_dir)?;

        handles.push(thread::spawn(move || {
            let _ = convert(&source_path, &target_path);
        }));
    }

    for h in handles {
        let _ = h.join();
    }

    Ok(())
}

I run it using time to measure CPU utilisation which gives

2.93s user 0.55s system 316% cpu 1.098 total

Then I try to implement the same task using the rayon (threadpool) crate:

fn main() -> Result<()> {
    let source_dir = PathBuf::from(get_first_arg()?);
    let target_dir = PathBuf::from(get_second_arg()?);

    let paths = get_file_paths(&source_dir)?;
    let pool = rayon::ThreadPoolBuilder::new().num_threads(15).build()?;

    for source_path in paths {
        let target_path = create_target_file_path(&source_path, &target_dir)?;

        pool.install(|| {
            let _ = convert(&source_path, &target_path);
        });
    }

    Ok(())
}

I run it using time to measure CPU utilisation which gives

2.97s user 0.53s system 98% cpu 3.561 total

I can't see any improvements when I use rayon. I probably use rayon the wrong way. Does anyone have an idea what is wrong with it?

Update

After some time of fight with the Rust checker

  • Related