I'm a newbie in python and need help with this piece of code. I did a lot of search to get to this stage but couldn't fix it on my own. Thanks in advance for your help.
What I'm trying to do is that I have to compare 100 csv files in a folder, and not all have the same number of columns or columns name. So I'm trying to use python to read the headers of each file and put in a csv file to output in a folder.
I got to this point but not sure if I'm on the right path even:
import pandas as pd
import glob
path = r'C:\Users\user1\Downloads\2016GAdata' # use your path
all_files = glob.glob(path "/*.csv")
list1 = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
list1.append(df)
frame = pd.concat(list1, axis=0, ignore_index=True)
print(frame)
thanks for your help!
CodePudding user response:
You can create a dictionary whose key is filename and value is dataframe columns. Using this dictionary to create dataframe results in filename as index and column names as column value.
d = {}
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
d[filename] = df.columns
frame = pd.DataFrame.from_dict(d, orient='index')
0 1 2 3
file1 Fruit Date Name Number
file2 Fruit Date Name None