I have a column in my excel spreadsheet that contains different types of numbers (i.e., Decimal, Currency, Percentage)
I need to read them into my DF in python and know which ones are which.
excel table looks like:
Group Q2_2022 Q3_2022 Q4_2022 Goal Comments
Team A 25 24 25 24 meets
Team B 18% 18% 19% 18% Q4 over
Team C $200 $225 $218 $220 Q4 under
df = pd.read_excel(file_one, Sheet One)
I need df['Goal] to include the symbol if it exists.
So I need to be able to tell which rows are tracking goals which way. I do not have any control over the source data. Is there anyway to do this when I read the data into the python dataframe?
Edited Based on solution by @Timeless below. Headed in the right direction but getting errors.
CodePudding user response:
You can approach this by using number_format
cell's attribute from openpyxl.
from openpyxl import load_workbook
from collections import defaultdict
import pandas as pd
wb = load_workbook("/tmp/file.xlsx")
ws = wb["Sheet1"]
data = defaultdict(list)
for row in ws.iter_rows(2):
for cell, col in zip(row, ws[1]):
fmt, v1, colname = cell.number_format, cell.value, col.value
v2 = f"${v1}" if "$" in fmt else f"{v1*100:g}%" if fmt == "0%" else v1
data[colname].append(v2)
df = pd.DataFrame(data)
Output :
print(df)
Group Q2_2022 Q3_2022 Q4_2022 Goal
0 1 25 24 25 24
1 2 18% 18% 19% 18%
2 3 $200 $225 $218 $220