getting directories and subdirectories using python OS and outputting csv file-CodePudding

I want to write Python code to create a CSV file based on getting information from a directory and subdirectories only if the subdirectory contains moo.

.
├── a.txt
├── b.txt
├── foo
│   └── w.txt
│   └── a.txt
└── moo
    └── cool.csv
    └── bad.csv
    └── more
        └── wow.csv

Expected Pandas dataframe:

FilePath    
S:\Test\moo\cool.csv
S:\Test\moo\bad.csv
S:\Test\moo\cool.csv
S:\Test\moo\more\wow.csv

How can I do this in python?

So far I gave the following code but not sure how to complete:

import os
import pandas as pd

root = 'S:\Test'
for path, subdirs, files in os.walk(root):
      print(path)

CodePudding user response：

import os
import csv

# Set the base directory
base_dir = 'S:\\Test'

# Initialize an empty list to store the file paths
file_paths = []

# Iterate through the directories and subdirectories
for root, dirs, files in os.walk(base_dir):
    # Iterate through the files
    for file in files:
        # Get the full path of the file
        file_path = os.path.join(root, file)
        # Add the file path to the list
        file_paths.append(file_path)

# Open a CSV file for writing
with open('file_paths.csv', 'w', newline='') as csvfile:
    # Create a CSV writer
    writer = csv.writer(csvfile)
    # Write the file paths to the CSV file
    writer.writerows([[file_path] for file_path in file_paths])

This code will iterate through the directories and subdirectories of the base directory, collect the file paths in a list, and then write the list to a CSV file called file_paths.csv.

If you want to use the pandas module to create a dataframe from the file paths, you can do so by using the following code after writing the file paths to the CSV file:

import pandas as pd

# Read the CSV file into a dataframe
df = pd.read_csv('file_paths.csv', header=None, names=['FilePath'])

CodePudding user response：

To create a CSV file based on information from a directory and its subdirectories, you can use the following approach:

1.Use the os.walk function to traverse the directory tree and get the paths of all the files in the tree.

2.Use the os.path.join function to join the base path of the directory tree with each file's relative path to get the absolute path of each file.

3.Create a list of dictionaries, where each dictionary represents a row in the CSV file and contains the absolute path of the file as a key-value pair.

4.Use the pandas.DataFrame.from_records function to create a Pandas dataframe from the list of dictionaries.

5.Use the pandas.DataFrame.to_csv function to write the dataframe to a CSV file. Here's the Python code that puts these steps together:

import os
import pandas as pd

# Set the base path of the directory tree
root = 'S:\Test'

# Initialize an empty list to store the file paths
file_paths = []

# Traverse the directory tree and get the paths of all the files
for path, subdirs, files in os.walk(root):
    if os.path.basename(path) == 'moo':
      for file in files:
        # Join the base path with the relative path of the file
        file_path = os.path.join(path, file)
        # Add the file path to the list
        file_paths.append(file_path)

# Create a list of dictionaries, where each dictionary represents a row in the CSV file
data = [{'FilePath': file_path} for file_path in file_paths]

# Create a Pandas dataframe from the list of dictionaries
df = pd.DataFrame.from_records(data)

# Write the dataframe to a CSV file
df.to_csv('files.csv', index=False)

This code will create a CSV file named files.csv in the current working directory, containing a single column FilePath with the absolute paths of all the files in the directory tree rooted at S:\Test. The dataframe will have one row for each file in the tree.

EDIT you changed the question, of course if you're looking for a particular folder you just have to filter with an if if os.path.basename(path) == 'moo':