Home > Enterprise >  Pandas: How to convert series with an integer/fraction mix into a whole number
Pandas: How to convert series with an integer/fraction mix into a whole number

Time:04-22

So I'm iterating thru Excel columns containing numbers and I'm trying to round all the numbers using .apply(pd.to_numeric).round()

This has always worked for me but recently, some of the Excel files contain columns with numbers mixed with fractions (e.g. 27 3/8, 50 17/32). When my script runs, I get "Unable to parse string "50 17/32" at position 0"

Suppose this is my series:

0           250.25
1           32.75
2           64
3           50 17/32
4           16 3/8
Name: Qty, dtype: object

Desired result:

0           250
1           33
2           64
3           51
4           16
Name: Qty, dtype: object

I'm trying to split the columns based on the white space and somehow trying to add the 2 columns together, but I'm running into all sorts of issues. The code below sort of works, but my original 'Qty' column is returning a bunch of NaNs instead of the original numbers for rows where there is no delimiter character

df['Qty'] = df['Qty'].fillna(value=np.nan)
df[['Qty','Fraction']] = df['Qty'].str.split(' ', expand=True)

Here's my original ['Qty'] column:

enter image description here

Here's the same rows after running that split code on it:

enter image description here

Intertingly, it does properly split the rows with integer-fraction mix, but turning certain rows to NaN for reasons I don't understand is throwing me off. Another thing I've tried is using lambda functions, but from what I can gather, those work best when it's just a traditional fraction like 3/8, without an integer in front of it. Been researching for hours and I'm close to giving up so if anyone has a clue how to go about this, I'd love to know

Thanks

CodePudding user response:

Here is one approach using fractions.Fraction:

from fractions import Fraction
df2 = df['Qty'].str.extract(r'(\d (?:\.\d )?)?\s*(\d /\d )?')

out = (pd.to_numeric(df2[0], errors='coerce')
       df2[1].fillna(0).apply(lambda x: float(Fraction(x)))
      )
df['float'] = out
df['int'] = out.round().astype(int)

output:

        Qty      float  int
0    250.25  250.25000  250
1     32.75   32.75000   33
2        64   64.00000   64
3  50 17/32   50.53125   51
4    16 3/8   16.37500   16

Alternative using arithmetic:

df2 = df['Qty'].str.extract(r'(\d (?:\.\d )?)?\s*(?:(\d )/(\d ))?').astype(float)
df['int'] = (df2[0] df2[1].fillna(0)/df2[2].fillna(1)).round().astype(int)
  • Related