Home > Enterprise >  merge excel files into one based on specific columns
merge excel files into one based on specific columns

Time:05-04

I need to merge multi-excel files based on a specific column as every file has two columns id and value and I need to merge all values from all files into one file next to each other. I tried this code but merged all the columns

cwd = os.path.abspath('/path/')     
files = os.listdir(cwd)      
df = pd.DataFrame()
for file in files:
   if file.endswith('.xlsx'):
       df = df.append(pd.read_excel('/path/'   file), ignore_index=True) 
df.head() 
df.to_excel('/path/merged.xlsx')

but got all values into a single column like

 1   504.0303111
 2  1587.678968
 3   1437.759643
 4   1588.387983 
 5   1059.194416 
 1   642.4925851
 2   459.3774304   
 3  1184.210851 
 4   1660.24336
 5   1321.414708 

and I need values stored like

  1  504.0303111  1 670.9609316     
  2  1587.678968  2 459.3774304     
  3  1437.759643  3 1184.210851     
  4  1588.387983  4 1660.24336      
  5  1059.194416  5 1321.414708  

CodePudding user response:

One way is to append the DataFrames to a list in loop and concatenate along the columns after the loop:

cwd = os.path.abspath('/path/')
files = os.listdir(cwd)

tmp = []
for i, file in enumerate(files[1:], 1):
    if file.endswith('.xlsx'):
        tmp.append(pd.read_excel('/path/'   file))
df = pd.concat(tmp, axis=1)
df.to_excel('/path/merged.xlsx')

But I feel like the following code would work better for you since it doesn't duplicate the id columns and only adds the value columns as new columns to a DataFrame df in loop:

cwd = os.path.abspath('/path/')
files = [file for file in os.listdir(cwd) if file.endswith('.xlsx')]
df = pd.read_excel('/path/'   files[0])

for i, file in enumerate(files[1:], 1):
    df[f'value{i}'] = pd.read_excel('/path/'   file).iloc[:, 1]

df.to_excel('/path/merged.xlsx')
  • Related