Home > Software design >  how to loop through a folder of csv files and read header of each? then output in a folder
how to loop through a folder of csv files and read header of each? then output in a folder

Time:04-13

I'm a newbie in python and need help with this piece of code. I did a lot of search to get to this stage but couldn't fix it on my own. Thanks in advance for your help.

What I'm trying to do is that I have to compare 100 csv files in a folder, and not all have the same number of columns or columns name. So I'm trying to use python to read the headers of each file and put in a csv file to output in a folder.

I got to this point but not sure if I'm on the right path even:

import pandas as pd
import glob

path = r'C:\Users\user1\Downloads\2016GAdata' # use your path
all_files = glob.glob(path   "/*.csv")

list1 = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    list1.append(df)

frame = pd.concat(list1, axis=0, ignore_index=True)

print(frame)

thanks for your help!

CodePudding user response:

You can create a dictionary whose key is filename and value is dataframe columns. Using this dictionary to create dataframe results in filename as index and column names as column value.

d = {}

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    d[filename] = df.columns

frame = pd.DataFrame.from_dict(d, orient='index')
           0     1     2       3
file1  Fruit  Date  Name  Number
file2  Fruit  Date  Name    None
  • Related