Merge multiple csv files into a new one csv file with the help of jupyter notebook-CodePudding

In fact, I have a problem with merging the csv files using python jupyter notebook. I wrote the below code, however, I still have problems, as the columns are not on the same level, the second column starts from the end of the first column, and so on. The column contents in different csv files are as follows: timestamp,load energy data, lighting data, operative data, please your help.

path = "C:/Users"

file_list = glob.glob(path   "/*.csv")
print('File names:', file_list)
 
csv_list = []

for file in file_list:
    csv_list.append(pd.read_csv(file))
csv_merged = pd.DataFrame()
 
for csv_file in csv_list:
    csv_merged = csv_merged.append(csv_file, ignore_index=True)
    
csv_merged.to_csv('C:/Users.csv',index=False)

Can I add more details into this code, such as names of columns, as well as exclude some columns, if possible, please let me know how I can do it.

CodePudding user response：

try pandas.merge function instead of using list for example:

import pandas as pd
path = "C:/Users"
file_list = glob.glob(path   "/*.csv")
print('File names:', file_list)

# merge data
data_frame = pd.read_csv(path   file_list[0])

for file in file_list:
    if file == file_list[0]:
        continue
    df_to_merge = pd.read_csv(path   file)
    data_frame.merge(df_to_merge)

data_frame.to_csv('C:/merge.csv')

CodePudding user response：

As Krishna mentions, it's not clear what's wrong with your code. Example files would have helped to better understand the issue.

However, using append in a for loop for dataframes is inefficient. It's better to use pd.concat as follows.

Code

path = "C:/Users"

file_list = glob.glob(path   "/*.csv")
print('File names:', file_list)

pd.concat(map(pd.read_csv, file_list), 
          ignore_index=True).to_csv('C:/Users.csv',index=False)

Explanation:

We create the merged dataframes with:

pd.concat(map(pd.read_csv, file_list), 
              ignore_index=True)

Create the output CSV file with:

to_csv('C:/Users.csv',index=False)