I am trying to do an automation in python and I need to iterate through all excel files in a folder and look for specific range of text in each excel and add it to a dataframe as a seperate column. I think I'm able to read the excel files and extract the required data but I got the result in 3 seperate pandas class dataframes.
filesList = ["sample1.xlsx", "sample2.xlsx", "sample3.xlsx"]
for file in filesList:
wb = load_workbook(file)
ws = wb.active
data_rows = []
for row in ws['C6':'C9']:
data_cols = []
for cell in row:
data_cols.append(cell.value)
data_rows.append(data_rows)
Output:
0
0 apple
1 sold
2 No stock
0
0 orange
1 not sold
2 In stock
0
0 kiwi
1 not sold
2 In stock
Desired output:
0 1 2
apple sold No stock
orange not sold In stock
kiwi not sold In stock
CodePudding user response:
You probably want to use the DataFrame.from_records() function. Your code seems to have two issues:
- You're not distinguishing between columns and rows of the
DataFrame
. - You're resetting the
DataFrame
on every iteration throughfilesList
.
Revised code:
filesList = ["sample1.xlsx", "sample2.xlsx", "sample3.xlsx"]
data_rows = [] # Stores the data that will be loaded into the DF
column_name_dict = { # Map Excel columns to DF columns
'C6': '0',
'C7': '1',
'C8': '2',
}
for file in filesList:
wb = load_workbook(file)
ws = wb.active
for cell_ref, df_col in column_name_dict.items():
data_cols = {}
data_cols[df_col] = ws[cell_ref].value
data_rows.append(data_cols)
composed_df = DataFrame.from_records(data_rows)
print(composed_df)
CodePudding user response:
Using openpyxl:
files = ["sample1.xlsx", "sample2.xlsx", "sample3.xlsx"]
dfs = []
for file in files:
work_book = load_workbook(filename=file, read_only=True, data_only=True)
data = [[x.value for x in row] for row in work_book.active["C6:C9"]]
df = pd.DataFrame(data).transpose()
dfs.append(df)
df_mapping = {x: y for x, y in enumerate(dfs)}
final_df = pd.concat([pd.DataFrame(df_mapping.get(x)) for x in df_mapping]).reset_index(drop=True)
print(final_df)