Home > database >  Pandas Add filename to a Excel combined file
Pandas Add filename to a Excel combined file

Time:10-28

I have a code that combines multiple excel files in 1 file, but I need to add a column with the name of the file used (filename).

Here is the code:

import os
import pandas as pd
cwd = os.path.abspath('') 
files = os.listdir(cwd)  

## Code gets the first sheet of a given file
df = pd.DataFrame()
for file in files:
    if file.endswith('.xlsx'):
        df = df.append(pd.read_excel(file), ignore_index=True) 
df.head() 
df.to_excel('Combined.xlsx')

How do I do to add a column with the filename for each file used?

Thanks

CodePudding user response:

Just add d["filename"] = file When you load Excel file in for loop:

import os
import pandas as pd
cwd = os.path.abspath('') 

files = os.listdir(cwd)  

## Code gets the first sheet of a given file
df = pd.DataFrame()
for file in files:
    if file.endswith('.xlsx'):
        d = pd.read_excel(file)
        d["filename"] = file
        df = df.append(d, ignore_index=True) 

df.head() 
df.to_excel('Combined.xlsx')

CodePudding user response:

Create a dict to collect your dataframes then combine them before exporting (and use pathlib instead of os module):

import pathlib
import pandas as pd

data = {}
for file in pathlib.Path().glob('*.xlsx'):
    data[file.name] = pd.read_excel(file)
pd.concat(data).to_excel('Combined.xlsx')

Note: if you want to get the filename without extension, use file.stem rather than file.name.

CodePudding user response:

Try this one.

import os
import pandas as pd
cwd = os.path.abspath('') 

files = os.listdir(cwd)  

## Code gets the first sheet of a given file
df = pd.DataFrame()
for file in files:
    if file.endswith('.xlsx'):
        df = df.append([file]) # Here is the code to ADD filename 
        df = df.append(pd.read_excel(file), ignore_index=True) 
df.head() 
df.to_excel('Combined.xlsx')
  • Related