Home > database >  How to write a search function to get the path of the desired file?
How to write a search function to get the path of the desired file?

Time:05-10

I have a directory structure that looks like the image. Folder Structure

This structure is organised in a form of objects. File objects are each in the form: {type:'file', name:'file a1'} for example. Folderobjects are like File objects, but they have an extra key children which is an array of sub folders/files.

Here's a console log of the rootFolder object from the image before:

screenshot of console.log(rootFolder)

The goal: What I want is to write a search function that returns the path for a specific file in the form of an array. For example, if I were to search for file c3, the result would be: ['folder a3', 'folder b4'].

I have tried many things, but recursion isn't my strongest suit.

The code could be in javascript or python; what I care about is the algorithm itself.

Here is the code used to generate rootFolder:

// defining constants
const TYPES = {FILE: 'file', FOLDER: 'folder'}
const FILE_NAMES = {
  FILE_A1:'file a1', 
  FILE_A2:'file a2', 
  FILE_A4:'file a4',
  FILE_B2:'file b2',
  FILE_B3:'file b3',
  FILE_C1:'file c1',
  FILE_C2:'file c2',
  FILE_C3:'file c3',
  FILE_C4:'file c4',
  FILE_C5:'file c5',
}
const FOLDER_NAMES = {
  ROOT_FOLDER:'root folder',
  FOLDER_A3:'folder a3',
  FOLDER_B1:'folder b1',
  FOLDER_B4:'folder b4'
}
// defining classes
class File {
  constructor(type, name) {
    this.name = name
    this.type = type
  }
}
class Folder {
  constructor (type, name, children) {
    this.name = name
    this.type = type
    this.children = children
  }
}
// defining file objects
const fileA1 = new File(TYPES.FILE, FILE_NAMES.FILE_A1)
const fileA2 = new File(TYPES.FILE, FILE_NAMES.FILE_A2)
const fileA4 = new File(TYPES.FILE, FILE_NAMES.FILE_A4)
const fileB2 = new File(TYPES.FILE, FILE_NAMES.FILE_B2)
const fileB3 = new File(TYPES.FILE, FILE_NAMES.FILE_B3)
const fileC1 = new File(TYPES.FILE, FILE_NAMES.FILE_C1)
const fileC2 = new File(TYPES.FILE, FILE_NAMES.FILE_C2)
const fileC3 = new File(TYPES.FILE, FILE_NAMES.FILE_C3)
const fileC4 = new File(TYPES.FILE, FILE_NAMES.FILE_C4)
const fileC5 = new File(TYPES.FILE, FILE_NAMES.FILE_C5)
// defining folder objects
const folderB1 = new Folder(TYPES.FOLDER, FOLDER_NAMES.FOLDER_B1, [fileC1, fileC2])
const folderB4 = new Folder(TYPES.FOLDER, FOLDER_NAMES.FOLDER_B4, [fileC3, fileC4, fileC5])
const folderA3 = new Folder(TYPES.FOLDER, FOLDER_NAMES.FOLDER_A3, [folderB1, fileB2, fileB3, folderB4])
const rootFolder = new Folder(TYPES.FOLDER, FOLDER_NAMES.ROOT_FOLDER, [fileA1, fileA2, folderA3, fileA4])

console.log(rootFolder);

CodePudding user response:

remove indrection

Before we get started, first remove the myriad of indirection caused by over 20 variables. We can use just two (2) classes and a single root -

class File {
  constructor(type, name) {
    this.name = name
    this.type = type
  }
}

class Folder {
  constructor(type, name, children) {
    this.name = name
    this.type = type
    this.children = children
  }
}

const root =
  new Folder("folder", "root folder", [
    new File("file", "file a1"),
    new File("file", "file a2"),
    new Folder("folder", "folder a3", [
      new Folder("folder", "folder b1", [
        new File("file", "file c1"),
        new File("file", "file c2")
      ]),
      new File("file", "file b2"),
      new File("file", "file b3"),
      new Folder("folder", "folder b4", [
        new File("file", "file c3"),
        new File("file", "file c4"),
        new File("file", "file c5")
      ])
    ]),
    new File("file", "file a4")
  ])

remove redundancy

TYPES.FILE and TYPES.FOLDER are unneeded if you have separate File and Folder classes. Each instance will already have an associated type -

class File {
  constructor(name) { // ✅ type not necessary
    this.name = name
  }
}

class Folder {
  constructor (name, children) { // ✅ type not necessary
    this.name = name
    this.children = children
  }
}

const root =
  new Folder("root folder", [
    new File("file a1"),
    new File("file a2"),
    new Folder("folder a3", [
      new Folder("folder b1", [
        new File("file c1"),
        new File("file c2")
      ]),
      new File("file b2"),
      new File("file b3"),
      new Folder("folder b4", [
        new File("file c3"),
        new File("file c4"),
        new File("file c5")
      ])
    ]),
    new File("file a4")
  ])

multiple return values

Because file systems support folders or files with the same name (at different hierarchies), it's important that our search algorithm can scan for all possible matches of the query. For a complete example, let's include file c3 in folder b1 and folder b4. When searching for this file, we expect to see both paths returned -

const root =
  new Folder("root folder", [
    new File("file a1"),
    new File("file a2"),
    new Folder("folder a3", [
      new Folder("folder b1", [
        new File("file c1"),
        new File("file c2"),
        new File("file c3")  // ✅ could have the same name
      ]),
      new File("file b2"),
      new File("file b3"),
      new Folder("folder b4", [
        new File("file c3"), // ✅ could have the same name
        new File("file c4"),
        new File("file c5")
      ])
    ]),
    new File("file a4")
  ])

the algorithm

Now that we've fixed the other issues with the code, let's write search(t, query). We can do this using inductive reasoning -

  1. If t is a File and its name matches the query, yield the singleton result [t]
  2. (inductive) t is not a File. If t is a Folder and its name matches the query input the singleton result [t]. And for all child of t.children, for all path of the recursive sub-problem search(child, query), prepend t to the path and yield.
  3. (inductive) t is neither a File nor a Folder. Throw a type error to notify the caller.
function* search(t, query) {
  switch(t?.constructor) {
    case File:
      if (t.name == query)
        yield [t]
      break
    case Folder:
      if (t.name == query)
        yield [t]
      for (const child of t.children)
        for (const path of search(child, query))
          yield [t, ...path]
      break
    default:
      throw Error(`cannot search unsupported type ${t}`)
  }
}

Here's some search examples -

for (const path of search(root, "root folder"))
  console.log([".", ...path.map(f => f.name)].join("/"))
./root folder
for (const path of search(root, "file a1"))
  console.log([".", ...path.map(f => f.name)].join("/"))
./root folder/file a1

Multiple paths are returned when a file or folder is found multiple times -

for (const path of search(root, "file c3"))
  console.log([".", ...path.map(f => f.name)].join("/"))
./root folder/folder a3/folder b1/file c3
./root folder/folder a3/folder b4/file c3

When a file or folder cannot be found, no output is given -

for (const path of search(root, "file Q"))
  console.log([".", ...path.map(f => f.name)].join("/"))
⚠️ no output

caller decides effect

By yielding a sequence of Folder and File objects, our search algorithm remains generic and reusable. Strings are not assembled by default and nothing is logged to the console. If the caller wishes to create a /-separated path, that is demonstrated above. Options for a different effect are left for the caller to decide.

only one result

Using a generator is perfect for search because the caller can choose to consume all matches or any number of them. In this example, we use a generic first to return only the first match. Note the generator is canceled immediately after a match is found and no additional work is done by the cpu -

function first(iter) {
  for (const v of iter)
    return v
}

const match = first(search(root, "file c3"))

if (match)
  console.log("found:", match.map(f => f.name).join("/"))
else
  console.error("no match found")
found: root folderfolder a3/folder b1/file c3

Make sure you do a proper null-check if using first in this way as it is possible that search yields nothing when no match is found -

const match = first(search(root, "ZZZ"))

if (match)
  console.log("found:", match.map(f => f.name).join("/"))
else
  console.error("no match found")
no match found

higher-order search

In the original algorithm matches are determined using ==. The algorithm could be more useful if we allow the caller to specify what constitutes a match -

function* search(t, query) {
  switch(t?.constructor) {
    case File:
      if (Boolean(query(t.name))) // ✅ query is a func
        yield [t]
      break
    case Folder:
      if (Boolean(query(t.name))) // ✅ query is a func
        yield [t]
      for (const child of t.children)
        for (const path of search(child, query))
          yield [t, ...path]
      break
    default:
      throw Error(`cannot search unsupported type ${t}`)
  }
}

Let's find all files that start with file c -

for (const path of search(root, filename => filename.startsWith("file c")))
  console.log([".", ...path.map(f => f.name)].join("/"))
./root folder/folder a3/folder b1/file c1
./root folder/folder a3/folder b1/file c2
./root folder/folder a3/folder b1/file c3
./root folder/folder a3/folder b4/file c3
./root folder/folder a3/folder b4/file c4
./root folder/folder a3/folder b4/file c5

specialization

With a higher-order search function, you can easily implement things like searchByString. This is known as a specialization -

function searchByString(t, query) {
  return search(t, filename => filename == query)
}
for (const path of searchByString(root, "file c3"))
  console.log([".", ...path.map(f => f.name)].join("/"))
./root folder/folder a3/folder b1/file c3
./root folder/folder a3/folder b4/file c3

code demo

The code above has been condensed so you can see it all in one place. Run the snippet and verify the output -

class File { constructor(name) { this.name = name } }

class Folder { constructor(name, children) { this.name = name;  this.children = children } }

function* search(t, query) {
  switch(t?.constructor) {
    case File: if (Boolean(query(t.name))) yield [t];  break
    case Folder: if (Boolean(query(t.name))) yield [t]; for (const child of t.children) for (const path of search(child, query)) yield [t, ...path]; break
    default: throw Error(`cannot search unsupported type ${t}`)
  }
}

function searchByString(t, query) {
  return search(t, filename => filename == query)
}

const relative = (path) => [".", ...path.map(f => f.name)].join("/")

const root = new Folder("root folder", [new File("file a1"), new File("file a2"), new Folder("folder a3", [new Folder("folder b1", [new File("file c1"), new File("file c2"), new File("file c3")]), new File("file b2"), new File("file b3"), new Folder("folder b4", [new File("file c3"), new File("file c4"), new File("file c5")])]), new File("file a4")])

console.log("== exact match ==")
for (const path of search(root, filename => filename == "root folder"))
  console.log(relative(path))

console.log("== starts with 'file c' ==")
for (const path of search(root, filename => filename.startsWith("file c")))
  console.log(relative(path))

console.log("== searchByString ==")
for (const path of searchByString(root, "file c3"))
  console.log(relative(path))
.as-console-wrapper { min-height: 100%; top: 0; }

== exact match ==
./root folder
== starts with 'file c' ==
./root folder/folder a3/folder b1/file c1
./root folder/folder a3/folder b1/file c2
./root folder/folder a3/folder b1/file c3
./root folder/folder a3/folder b4/file c3
./root folder/folder a3/folder b4/file c4
./root folder/folder a3/folder b4/file c5
== searchByString ==
./root folder/folder a3/folder b1/file c3
./root folder/folder a3/folder b4/file c3

python

search is written exactly the same in Python -

class file:
  def __init__(self, name):
    self.name = name

class folder:
  def __init__(self, name, children):
    self.name = name
    self.children = children

def search(t, query):
  if isinstance(t, file):
    if t.name == query:
      yield [t]
  elif isinstance(t, folder):
    if t.name == query:
      yield [t]
    for child in t.children:
      for path in search(child, query):
        yield [t, *path]
  else:
    raise TypeError("unsupported", t)

If you are using Python 3.10, there are some new features available to you. Adjust search to take advantage of the new structural match..case syntax -

from dataclasses import dataclass
from typing import Generator, Union

Ftree = Union["file", "folder"]

@dataclass
class file:
  name: str

@dataclass
class folder:
  name: str
  children: list[Ftree]

def search(t: Ftree, query: str) -> Generator[list[Ftree], None, None]:
  match t:
    case file(name):
      if name == query: yield [t]
    case folder(name, children):
      if name == query: yield [t]
      for child in t.children:
        for path in search(child, query):
          yield [t, *path]
    case _:
      raise TypeError("unsupported", t)

Given a tree -

root = \
  folder("root folder", [
    file("file a1"),
    file("file a2"),
    folder("folder a3", [
      folder("folder b1", [
        file("file c1"),
        file("file c2"),
        file("file c3")
      ]),
      file("file b2"),
      file("file b3"),
      folder("folder b4", [
        file("file c3"),
        file("file c4"),
        file("file c5")
      ])
    ]),
    file("file a4")
  ])

Both implementations behave identically -

for path in search(root, "root folder"):
  print("/".join([".", *map(lambda f: f.name, path)]))
./root folder
for path in search(root, "file c3"):
  print("/".join([".", *map(lambda f: f.name, path)]))
./root folder/folder a3/folder b1/file c3
./root folder/folder a3/folder b4/file c3
  • Related