Home > Software engineering >  Parallel Recursion Fix
Parallel Recursion Fix

Time:10-26

Quite new to Rust and trying to tackle toy problems. Trying to write a directory traversal with only Rayon.

struct Node {
    path: PathBuf,
    files: Vec<PathBuf>,
    hashes: Vec<String>,
    folders: Vec<Box<Node>>,
}

impl Node {
    pub fn new(path: PathBuf) -> Self {
        Node {
            path: path,
            files: Vec::new(),
            hashes: Vec::new(),
            folders: Vec::new(),
        }
    }
    
    pub fn burrow(&mut self) {
        let mut contents: Vec<PathBuf> = ls_dir(&self.path);

        contents.par_iter().for_each(|item| 
                                if item.is_file() {
                                    self.files.push(*item);
                                } else if item.is_dir() {
                                    let mut new_folder = Node::new(*item);
                                    new_folder.burrow();
                                    self.folders.push(Box::new(new_folder));
                                });
        
    }
}

The errors I am receiving are

error[E0596]: cannot borrow `*self.files` as mutable, as it is a captured variable in a `Fn` closure
  --> src/main.rs:40:37
   |
40 | ...                   self.files.push(*item);
   |                       ^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable

error[E0507]: cannot move out of `*item` which is behind a shared reference
  --> src/main.rs:40:53
   |
40 | ...                   self.files.push(*item);
   |                                       ^^^^^ move occurs because `*item` has type `PathBuf`, which does not implement the `Copy` trait

error[E0507]: cannot move out of `*item` which is behind a shared reference
  --> src/main.rs:42:68
   |
42 | ...                   let mut new_folder = Node::new(*item);
   |                                                      ^^^^^ move occurs because `*item` has type `PathBuf`, which does not implement the `Copy` trait

error[E0596]: cannot borrow `*self.folders` as mutable, as it is a captured variable in a `Fn` closure
  --> src/main.rs:44:37
   |
44 | ...                   self.folders.push(Box::new(new_folder));
   |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable

The errors are clear in that they are preventing different threads from accessing mutable memory, but I'm just not sure how to start to address the errors.

Below is the original (non-parallel) version of burrow

pub fn burrow(&mut self) {
    let mut contents: Vec<PathBuf> = ls_dir(&self.path);

    for item in contents {
        if item.is_file() {
            self.files.push(item);
        } else if item.is_dir() {
            let mut new_folder = Node::new(item);
            new_folder.burrow();
            self.folders.push(Box::new(new_folder));
        }
    }
}

CodePudding user response:

The best option in this case is to use ParallelIterator::partition_map() which allows you to turn a parallel iterator into two different collections according to some condition, which is exactly what you need to do.

Example program:

use rayon::iter::{Either, IntoParallelIterator, ParallelIterator};

fn main() {
    let input = vec!["a", "bb", "c", "dd"];

    let (chars, strings): (Vec<char>, Vec<&str>) =
        input.into_par_iter().partition_map(|s| {
            if s.len() == 1 {
                Either::Left(s.chars().next().unwrap())
            } else {
                Either::Right(s)
            }
        });

    dbg!(chars, strings);
}

If you had three different outputs, unfortunately Rayon does not support that. I haven't looked at whether it'd be possible to build using Rayon's traits, but what I would suggest as a more general (though not quite as efficient) solution is to use channels. A channel like std::sync::mpsc allows any number of threads to insert items while another thread removes them — in your case, to move them into a collection. This would not be quite as efficient as parallel collection, but in an IO-dominated problem like yours, it would not be significant.

CodePudding user response:

I'm going to skip the separation of files and folders, ignore the structure, and demonstrate a simple recursive approach that gets all the files in a directory recursively:

fn burrow(dir: &Path) -> Vec<PathBuf> {
    let mut contents = vec![];

    for entry in std::fs::read_dir(dir).unwrap() {
        let entry = entry.unwrap().path();
        if entry.is_dir() {
            contents.extend(burrow(&entry));
        } else {
            contents.push(entry);
        }
    }

    contents
}

The first step if you want to use the parallel iterators from rayon, is to convert this loop into a non-parallel iterator chain. The best way to do that is with .flat_map() to flatten results that yield more than one element:

fn burrow(dir: &Path) -> Vec<PathBuf> {
    std::fs::read_dir(dir)
        .unwrap()
        .flat_map(|entry| {
            let entry = entry.unwrap().path();
            if entry.is_dir() {
                burrow(&entry)
            } else {
                vec![entry] // use a single-element Vec if not a directory
            }
        })
        .collect()
}

Then to use rayon to process this iteration in parallel is to use .par_bridge() to convert an iterator into a parallel iterator. And that's it actually:

use rayon::iter::{ParallelBridge, ParallelIterator};

fn burrow(dir: &Path) -> Vec<PathBuf> {
    std::fs::read_dir(dir)
        .unwrap()
        .par_bridge()
        .flat_map(|entry| {
            let entry = entry.unwrap().path();
            if entry.is_dir() {
                burrow(&entry)
            } else {
                vec![entry]
            }
        })
        .collect()
}

See it working on the playground. You can extend on this to collect more complex results (like folders and hashes and whatever else).

  • Related