I need to merge multi-excel files based on a specific column as every file has two columns id
and value
and I need to merge all values
from all files into one file next to each other. I tried this code but merged all the columns
cwd = os.path.abspath('/path/')
files = os.listdir(cwd)
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsx'):
df = df.append(pd.read_excel('/path/' file), ignore_index=True)
df.head()
df.to_excel('/path/merged.xlsx')
but got all values into a single column like
1 504.0303111
2 1587.678968
3 1437.759643
4 1588.387983
5 1059.194416
1 642.4925851
2 459.3774304
3 1184.210851
4 1660.24336
5 1321.414708
and I need values stored like
1 504.0303111 1 670.9609316
2 1587.678968 2 459.3774304
3 1437.759643 3 1184.210851
4 1588.387983 4 1660.24336
5 1059.194416 5 1321.414708
CodePudding user response:
One way is to append the DataFrames to a list in loop and concatenate along the columns after the loop:
cwd = os.path.abspath('/path/')
files = os.listdir(cwd)
tmp = []
for i, file in enumerate(files[1:], 1):
if file.endswith('.xlsx'):
tmp.append(pd.read_excel('/path/' file))
df = pd.concat(tmp, axis=1)
df.to_excel('/path/merged.xlsx')
But I feel like the following code would work better for you since it doesn't duplicate the id
columns and only adds the value columns as new columns to a DataFrame df
in loop:
cwd = os.path.abspath('/path/')
files = [file for file in os.listdir(cwd) if file.endswith('.xlsx')]
df = pd.read_excel('/path/' files[0])
for i, file in enumerate(files[1:], 1):
df[f'value{i}'] = pd.read_excel('/path/' file).iloc[:, 1]
df.to_excel('/path/merged.xlsx')