Home > Software design >  Merging multiple CSV files into 1 in Python but files are appearing unsorted which affects data in d
Merging multiple CSV files into 1 in Python but files are appearing unsorted which affects data in d

Time:12-05

I'm trying to merge my 119 csv files into one file through a python code. The only issue I'm facing is that even though I've applied the sort method it isnt working and my files are not ordered , which is causing the date column to be un-ordered. Below is the code, when I run this and open my new csv file "call.sms.merged" it appears that after my 1st csv file, data is inserted or merged from the 10th csv then 100th csv till 109 csv & then it starts to begin from csv 11. I'm attaching an image for better understanding.

file_path = "C:\\Users\\PROJECT\\Data Set\\SMS Data\\"
file_list = [file_path   f for f in os.listdir(file_path) if f.startswith('call. sms ')]
csv_list = []
for file in sorted(file_list):
    csv_list.append(pd.read_csv(file).assign(File_Name = os.path.basename(file)))
csv_merged = pd.concat(csv_list, ignore_index=True)
csv_merged.to_csv(file_path   'calls.sms.merged.csv', index=False)

UN-SORTED DATA

Incorrect order of csv

un-ordered

Python Code and Error : Python Code Screenshot

Error Screenshot

CodePudding user response:

You can extract the number of each call/file with pandas.Series.str.extract then use pandas.DataFrame.sort_values to make an ascending sort along this column/number.

Try this :

file_path = "C:\\Users\\PROJECT\\Data Set\\SMS Data\\"
file_list = [file_path   f for f in os.listdir(file_path) if f.startswith('call. sms ')]
csv_list = []

for file in file_list:
    csv_list.append(pd.read_csv(file).assign(File_Name = os.path.basename(file)))
    
csv_merged = (
                pd.concat(csv_list, ignore_index=True)
                    .assign(num_call= lambda x: x["File_Name"].str.extract("(\d{1,})", expand=False).astype(int))
                    .sort_values(by="num_call", ignore_index=True)
                    .drop(columns= "num_call")
             )

csv_merged.to_csv(file_path   'calls.sms.merged.csv', index=False)
  • Related