I join 437 tables and I get 3 columns for state as my coworkers feel like giving it a different name each day, ("state", "state:" and "State"), is there a way that joins those 3 columns to just 1 column called "state"?.
*also my code uses append, I just saw its deprecated, will it work the same using concat? any way to make it give the same results as append?.
I tried:
excl_merged.rename(columns={"state:": "state", "State": "state"})
but it doesn't do anything.
The code I use:
# importing the required modules
import glob
import pandas as pd
# specifying the path to csv files
path = "X:/.../Admission_merge"
# csv files in the path
file_list = glob.glob(path "/*.xlsx")
# list of excel files we want to merge.
# pd.read_excel(file_path) reads the excel
# data into pandas dataframe.
excl_list = []
for file in file_list:
excl_list.append(pd.read_excel(file)) #use .concat will it give the columns in the same order?
# create a new dataframe to store the
# merged excel file.
excl_merged = pd.DataFrame()
for excl_file in excl_list:
# appends the data into the excl_merged
# dataframe.
excl_merged = excl_merged.append(
excl_file, ignore_index=True)
# exports the dataframe into excel file with
# specified name.
excl_merged.to_excel('X:/.../Admission_MERGED/total_admission_2021-2023.xlsx', index=False)
print("Merge finished")
Any suggestions how I can improve it? also is there a way to remove unnamed empty columns?.
Thanks a lot.
CodePudding user response:
You can use pd.concat
:
excl_list = ['state1.xlsx', 'state2.xlsx', 'state3.xlsx']
state_map = {'state:': 'state', 'State': 'state'}
data = []
for excl_file in excl_list:
df = pd.read_excel(excl_file).rename(columns=state_map)
data.append(df)
excl_merged = pd.concat(data, ignore_index=True)
print(excl_merged)
# Output
ID state
0 A a
1 B b
2 C c
3 D d
4 E e
5 F f
6 G g
7 H h
8 I i
file1.xlsx:
ID State
0 A a
1 B b
2 C c
file2.xlsx:
ID state
0 D d
1 E e
2 F f
file3.xlsx:
ID state:
0 G g
1 H h
2 I i
If you have empty columns, you can use data.append(df.dropna(how='all', axis=1))
before appending to data list.