I have 1000 csv files with same columns names. i want to merge them respectively. I am using below code, but it merge all csv files randomly.
files = os.path.join(path_files, "*_a.csv")
files = glob.glob(files)
df = pd.concat(map(pd.read_csv, files), ignore_index=True)
for example it put first 1000_a.csv then 1_a.csv and etc. But i want to merge respectively and then remove first 100 of them.
like this as a dataframe or a single csv file:
1_a.csv, 2_a.csv, 3_a.csv, ..., 1000_a.csv
could you please let me know how it is possible?
CodePudding user response:
i hope it will be usefully for you
import pandas as pd
df = pd.DataFrame()
for csv_file in sorted(list_filenames):
temp_df = pd.read_csv(scv_file)
df = pd.concat([df, temp_df])
CodePudding user response:
You can sort filenames by integers before _
, or remove _a.csv
or last 6 characters:
files = os.path.join(path_files, "*_a.csv")
files = sorted(glob.glob(files), key=lambda x: int(x.split('_')[0]))
#alternative1
#files = sorted(glob.glob(files), key=lambda x: int(x.replace('_a.csv','')))
#alternative2
#files = sorted(glob.glob(files), key=lambda x: int(x[:-6]))
df = pd.concat(map(pd.read_csv, files), ignore_index=True)
CodePudding user response:
You shoud re-order glob.glob() results like this:
files_path = os.path.join(base_path, "*_a.csv")
files = sorted(glob.glob(files_path), key=lambda name: int(name[0:-6]))
df = pd.concat(map(pd.read_csv, files), ignore_index=True)
And there are similar questions about natural sort: Is there a built in function for string natural sort?
CodePudding user response:
This is an alternative solution.
os.chdir(path_files)
all_filenames = [i for i in sorted(glob.glob('*.{}'.format('csv')))]
df = pd.concat([pd.read_csv(f) for f in all_filenames ]).reset_index()