Home > Software engineering >  merge multiple CSV files continiously
merge multiple CSV files continiously

Time:09-28

I have 1000 csv files with same columns names. i want to merge them respectively. I am using below code, but it merge all csv files randomly.

files = os.path.join(path_files, "*_a.csv")
files = glob.glob(files)
df = pd.concat(map(pd.read_csv, files), ignore_index=True)

for example it put first 1000_a.csv then 1_a.csv and etc. But i want to merge respectively and then remove first 100 of them.

like this as a dataframe or a single csv file:

1_a.csv, 2_a.csv, 3_a.csv, ..., 1000_a.csv

could you please let me know how it is possible?

CodePudding user response:

i hope it will be usefully for you

import pandas as pd
df = pd.DataFrame()
for csv_file in sorted(list_filenames):
  temp_df = pd.read_csv(scv_file)
  df = pd.concat([df, temp_df]) 

CodePudding user response:

You can sort filenames by integers before _, or remove _a.csv or last 6 characters:

files = os.path.join(path_files, "*_a.csv")

files = sorted(glob.glob(files), key=lambda x: int(x.split('_')[0]))
#alternative1
#files = sorted(glob.glob(files), key=lambda x: int(x.replace('_a.csv','')))
#alternative2
#files = sorted(glob.glob(files), key=lambda x: int(x[:-6]))
df = pd.concat(map(pd.read_csv, files), ignore_index=True)

CodePudding user response:

You shoud re-order glob.glob() results like this:

files_path = os.path.join(base_path, "*_a.csv")
files = sorted(glob.glob(files_path), key=lambda name: int(name[0:-6]))
df = pd.concat(map(pd.read_csv, files), ignore_index=True)

And there are similar questions about natural sort: Is there a built in function for string natural sort?

CodePudding user response:

This is an alternative solution.

os.chdir(path_files)
all_filenames = [i for i in sorted(glob.glob('*.{}'.format('csv')))]
df = pd.concat([pd.read_csv(f) for f in all_filenames ]).reset_index()
  • Related