Home > database >  Python read excel distinguish number type
Python read excel distinguish number type

Time:01-26

I have a column in my excel spreadsheet that contains different types of numbers (i.e., Decimal, Currency, Percentage)

I need to read them into my DF in python and know which ones are which.

excel table looks like:

Group       Q2_2022      Q3_2022     Q4_2022       Goal        Comments
Team A        25           24            25           24       meets 
Team B        18%         18%           19%          18%       Q4 over
Team C        $200        $225          $218         $220      Q4 under

df = pd.read_excel(file_one, Sheet One)

I need df['Goal] to include the symbol if it exists.

So I need to be able to tell which rows are tracking goals which way. I do not have any control over the source data. Is there anyway to do this when I read the data into the python dataframe?

Edited Based on solution by @Timeless below. Headed in the right direction but getting errors. Error On String

Values in filed

CodePudding user response:

You can approach this by using number_format cell's attribute from .

from openpyxl import load_workbook
from collections import defaultdict
import pandas as pd
​
wb = load_workbook("/tmp/file.xlsx")
ws = wb["Sheet1"]
​
data = defaultdict(list)
for row in ws.iter_rows(2):
    for cell, col in zip(row, ws[1]):
        fmt, v1, colname = cell.number_format, cell.value, col.value
        v2 = f"${v1}" if "$" in fmt else f"{v1*100:g}%" if fmt == "0%" else v1
        data[colname].append(v2)
​
df = pd.DataFrame(data)

Output :

​print(df)

   Group Q2_2022 Q3_2022 Q4_2022  Goal
0      1      25      24      25    24
1      2     18%     18%     19%   18%
2      3    $200    $225    $218  $220
  • Related