I have a folder let's say as 'input_Folder' which has a list of CSV files with data. I'm trying to write a Python code which reads this list of CSV files from the input_folder and creates a master CSV file with two columns.
The columns in the master CSV files are 'Scenario' and 'Status'.
Column Name requirement are as follows, Scenario = Name of the file from the directory and Status = if the file has a value in the second row of second column then populate as 'Pass' else 'Fail'
Below is my code. After executing the code I'm able to see Master CSV created but with empty lines. I'm quite new to python so could somebody help me out here please
import os
import csv
path = (Input file path)
with open("SUMMARY.csv", 'w') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['SCENARIO', 'STATUS'])
for files in os.walk(path):
for filename in files:
with open("input_file.csv") as csv_file: #checking if the code is working for
# one sample file
all_rows = list(csv_file)
line_count = 0
for row in all_rows[1:2]:
if line_count == 1:
if row[1].value == None:
writer.writerow([os.path.basename(filename).split(".")[0], 'PASS'])
else:
writer.writerow([os.path.basename(filename).split(".")[0], 'FAIL'])
line_count = 1
CodePudding user response:
Your attempt has three problems; it uses os.walk
which traverses subdirectories (perhaps this is not a problem because your folder does not have subdirectories, but you should use the correct function for your use case regardless), and you are opening a file in the current directory instead of the one actually returned by os.walk
. Finally, the input from csv.reader
cannot be None
; either the line contains fewer fields (in which case you cannot access the second field at all, and trying will get you an IndexError
), or it contains an empty string. (More fundamentally, your indentation seems to be broken, but since you are not asking about a syntax error, I'm guessing your actual code doesn't have this problem.)
Here's a quick refactoring to use glob.glob
instead of os.walk
, assuming that the input CSV files have an empty field where you were looking for None
. (It would obviously not be hard to change it to if len(line) < 2:
if you wanted to, or cover both conditions.)
import csv
import os
from glob import glob
with open("SUMMARY.csv", 'w', encoding='utf-8') as output_file:
writer = csv.writer(output_file)
writer.writerow(['SCENARIO', 'STATUS'])
for filename in glob(f"{path}/*.csv"):
with open(filename, 'r', encoding='utf-8') as input_file:
value = "FAIL"
reader = csv.reader(input_file)
for lineno, line in enumerate(reader, 1):
if lineno != 2:
continue
if line[1] != "":
value = "PASS"
break
writer.writerow([os.path.basename(filename).split(".")[0], value])
Tangentially perhaps notice also how I avoid having two variables with almost the same names csvfile
and csv_file
.
The logic writes "FAIL"
if there is only one input line, too. (Refactored in response to a comment.)
CodePudding user response:
Please review my code below. Create a folder where you'd like all your CSV files to live and run the code below to join all CSV files into a df. For this example, I named my folder, "Folder"
The "*.csv" finds all csv files in "Folder"
import pandas as pd, numpy as np
import glob
# Create a folder where you want to dump files
all_files = glob.glob("Folder/*.csv")
print(all_files)
li = []
# join csv files
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
# df = combined csv files
df = pd.concat(li, axis=0, ignore_index=True)