I tried many answers but none of them working for me:
For example this: Import multiple CSV files into pandas and concatenate into one DataFrame
import pandas as pd
import glob
import os
path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(os.path.join(path , "/*.csv"))
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
I have only 2 csv files:
1.csv:
1,1
2,1
3,1
4,1
5,1
2.csv:
6,1
7,1
8,1
9,1
To be fair, this is my routine for merging:
files = glob.glob("data/*.csv")
df = []
for f in files:
csv = pd.read_csv(f, index_col=None, header=0)
df.append(csv)
df = pd.concat(df, axis=0, ignore_index=True)
df.to_csv("all.csv")
print(df);
This is the output (print(df)):
1 1.1 6
0 2 1.0 NaN
1 3 1.0 NaN
2 4 1.0 NaN
3 5 1.0 NaN
4 1 NaN 7.0
5 1 NaN 8.0
6 1 NaN 9.0
And this is the "all.csv":
,1,1.1,6
0,2,1.0,
1,3,1.0,
2,4,1.0,
3,5,1.0,
4,1,,7.0
5,1,,8.0
6,1,,9.0
Whereas I would need all.csv to be:
1,1
2,1
3,1
4,1
5,1
6,1
7,1
8,1
9,1
I'm using Python3.9 with PyCharm 2022.3.1.
Why is my all.csv look like that, and how can I simply read multiple csv into one dataframe for further processing?
CodePudding user response:
You have to swap header
and index_col
values:
data = []
for filename in all_files:
df = pd.read_csv(filename, index_col=0, header=None)
data.append(df)
df = pd.concat(data, axis=0)
df.to_csv('all.csv', header=False)
Output all.csv
:
1,1
2,1
3,1
4,1
5,1
6,1
7,1
8,1
9,1
CodePudding user response:
I have been successfully using this code to append several Excel files into one dataframe; it also forces certain columns to be string datatypes, which can be changed, of course:
fmask = 'C:/Users/USER/FILEPATH/*.xlsx'
# lst of column names which needs to be string
lst_str_cols = ['ItemID', 'StoreID']
# use dictionary comprehension to make dict of dtypes
dict_dtypes = {x : 'str' for x in lst_str_cols}
dfs = []
for f in glob.glob(fmask):
df = pd.read_excel(f, dtype=dict_dtypes)
dfs.append(df)
# concatenated
dfConcatenate = pd.concat(dfs, ignore_index=True)
Instead of Excel you can change it to CSV.
CodePudding user response:
import pandas as pd
df1 = pd.read_csv('csvfile1.csv', header=None)
df1 = pd.read_csv('csvfile2.csv', header=None)
df = pd.concat([df1, df2], ignore_index=True)
for index, row in df.iterrows():
print(row[0], row[1])