Home > Enterprise >  Calling a function with a (pseudo-)mutable Map (and changing Map)
Calling a function with a (pseudo-)mutable Map (and changing Map)

Time:09-17

I'm played with Haskell, but only just. Working with true immutability is confusing to me.

Specifically, I have the following function (right now it's largely debugging stuff I've thrown in.

type BySize = Map Int [Finfo]
-- ... other stuff ...
-- Walk directories and return map
walkDir :: String -> BySize -> IO ([BySize])
walkDir rootdir bySize = do
    let !bySizeHist = [bySize]
    pathWalk rootdir (\root dirs files -> do
        forM_ files $ (\file -> do
            let !latest = head bySizeHist
            finfo <- do processPath (joinPath [root, file]) latest
            let !new = addBySize (f_size finfo) finfo latest

            let latest_size = Map.keys latest
            let new_size = Map.keys new
            let error = if latest == new
                then
                    "Error, identical maps!"
                else
                    "Update of map is fine"    (show latest_size)    (show new_size)
            putStrLn error

            let !bySizeHist = [new]    bySizeHist
            putStrLn (fname finfo) ))
    return bySizeHist

Basically, my goal is to get a Map that has file size for keys, and a list of Finfo (file info) data structures as values. I tried a lot of different variations, this is merely the latest one that does not work.

I know that Maps are immutable, so I was hoping to generate a list of versions, and then utilize the latest one downstream. But I think maybe I should be using the State monad instead. I don't actually care about the history of Map versions, I was merely trying that in a fumbling approach.

The function addBySize works by itself. That is, given size, and a new Finfo object, it correctly returns a new Map based on the old one, but with either a new key added or the list that the existing key maps to expanded with the new Finfo object.

The problem is that the attempt to "rebind" bySizeHist fails (I think because of falling out of scope within the loop). So whereas I'd like to keep echoing an expanding list of keys during each pass through the loop, instead I get something like:

% haskell/find-dups haskell
Update of map is fine[][6]
/home/dmertz/git/LanguagePractice/haskell/that
Update of map is fine[][3235]
/home/dmertz/git/LanguagePractice/haskell/sha1sum.hi
Update of map is fine[][8160]
/home/dmertz/git/LanguagePractice/haskell/sha1sum.o
Update of map is fine[][241]
/home/dmertz/git/LanguagePractice/haskell/sha1sum.hs
Update of map is fine[][6]

I.e. latest is never really the latest version of the Map, but I always add new on each loop, but always to the empty BySize Map.

The solution proposed below is amazingly helpful. However, I wish to exclude symbolic links.

I modified getAllFiles somewhat, to try to exclude symbolic links. But my approach fails to exclude directories that are symbolic links. I tried some variations that do not work. The version I have that only partially works:

-- Lazily return (normal) files from rootdir
getAllFiles :: FilePath -> IO [FilePath]
getAllFiles root = do
  nodes <- pathWalkLazy root
  -- get file paths from each node
  let files = [dir </> f | (dir, _, files) <- nodes, f <- files ]
  normalFiles <- filterM (liftM not . pathIsSymbolicLink) files
  return normalFiles

CodePudding user response:

I'll let someone else provide a direct answer to your question, but the right way to do this is probably not to do this. The program you want to write is:

getBySize :: FilePath -> IO BySize
getBySize root = do
  -- first, get all the files
  files <- getAllFiles root
  -- convert them all to finfos
  finfos <- mapM getFinfo files
  -- get a list of size/finfo pairs
  let pairs = [(f_size finfo, finfo) | finfo <- finfos]
  -- convert it to a map, allowing duplicate keys
  return $ fromListWithDuplicates pairs

This is a reasonable, functional way of accomplishing your goal. You grab all the filenames at once and apply some functional transformations (to Finfos, to pairs, to a Map). No need to fuss with mutability or state.

Writing fromListWithDuplicates is a little complicated, but it's standard. It gets rewritten so often that it, or something like it, should probably be part of Data.Map:

fromListWithDuplicates :: Ord k => [(k, v)] -> Map k [v]
fromListWithDuplicates pairs = Map.fromListWith (  ) [(k, [v]) | (k, v) <- pairs]

The idea is that it takes the list of key-value pairs, converts all the values to singleton lists and then uses fromListWith to produce a map by concatenating those singletons together in case of duplicates.

You probably already have a getFinfo function, whatever your Finfo is. I used the following for testing:

data Finfo = Finfo { f_path :: FilePath, f_size :: Int }

getFinfo :: FilePath -> IO Finfo
getFinfo path = do
  sz <- getFileSize path
  return $ Finfo path (fromIntegral sz)

The only remaining function is getAllFiles, which gets a list of all files (as full path names, already joined with the parent directory). One way to write it is with pathWalkLazy from System.Directory.PathWalk:

getAllFiles :: FilePath -> IO [FilePath]
getAllFiles root = do
  nodes <- pathWalkLazy root
  -- get file paths from each node
  let files = [dir </> file | (dir, _, files) <- nodes, file <- files]
  return files

A full sample program. It takes a single argument, the directory to process.

import System.Directory
import System.Directory.PathWalk
import System.Environment
import System.FilePath
import Data.Map.Strict (Map)
import qualified Data.Map.Strict as Map

type BySize = Map Int [Finfo]

getBySize :: FilePath -> IO BySize
getBySize root = do
  -- first, get all the files
  files <- getAllFiles root
  -- convert them all to finfos
  finfos <- mapM getFinfo files
  -- get a list of size/finfo pairs
  let pairs = [(f_size finfo, finfo) | finfo <- finfos]
  -- convert it to a map, allowing duplicate keys
  return $ fromListWithDuplicates pairs

-- this is a little complicated, but standard
fromListWithDuplicates :: Ord k => [(k, v)] -> Map k [v]
fromListWithDuplicates pairs = Map.fromListWith (  ) [(k, [v]) | (k, v) <- pairs]

getAllFiles :: FilePath -> IO [FilePath]
getAllFiles root = do
  nodes <- pathWalkLazy root
  -- get file paths from each node
  let files = [dir </> file | (dir, _, files) <- nodes, file <- files]
  return files

data Finfo = Finfo { f_path :: FilePath, f_size :: Int }
  deriving (Show)

getFinfo :: FilePath -> IO Finfo
getFinfo path = do
  sz <- getFileSize path
  return $ Finfo path (fromIntegral sz)

main = do
  [root] <- getArgs
  bs <- getBySize root
  print bs
  • Related