I have a code that combines multiple excel files in 1 file, but I need to add a column with the name of the file used (filename).
Here is the code:
import os
import pandas as pd
cwd = os.path.abspath('')
files = os.listdir(cwd)
## Code gets the first sheet of a given file
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsx'):
df = df.append(pd.read_excel(file), ignore_index=True)
df.head()
df.to_excel('Combined.xlsx')
How do I do to add a column with the filename for each file used?
Thanks
CodePudding user response:
Just add d["filename"] = file
When you load Excel file in for loop:
import os
import pandas as pd
cwd = os.path.abspath('')
files = os.listdir(cwd)
## Code gets the first sheet of a given file
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsx'):
d = pd.read_excel(file)
d["filename"] = file
df = df.append(d, ignore_index=True)
df.head()
df.to_excel('Combined.xlsx')
CodePudding user response:
Create a dict to collect your dataframes then combine them before exporting (and use pathlib
instead of os
module):
import pathlib
import pandas as pd
data = {}
for file in pathlib.Path().glob('*.xlsx'):
data[file.name] = pd.read_excel(file)
pd.concat(data).to_excel('Combined.xlsx')
Note: if you want to get the filename without extension, use file.stem
rather than file.name
.
CodePudding user response:
Try this one.
import os
import pandas as pd
cwd = os.path.abspath('')
files = os.listdir(cwd)
## Code gets the first sheet of a given file
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsx'):
df = df.append([file]) # Here is the code to ADD filename
df = df.append(pd.read_excel(file), ignore_index=True)
df.head()
df.to_excel('Combined.xlsx')