So I'm iterating thru Excel columns containing numbers and I'm trying to round all the numbers using .apply(pd.to_numeric).round()
This has always worked for me but recently, some of the Excel files contain columns with numbers mixed with fractions (e.g. 27 3/8, 50 17/32). When my script runs, I get "Unable to parse string "50 17/32" at position 0"
Suppose this is my series:
0 250.25
1 32.75
2 64
3 50 17/32
4 16 3/8
Name: Qty, dtype: object
Desired result:
0 250
1 33
2 64
3 51
4 16
Name: Qty, dtype: object
I'm trying to split the columns based on the white space and somehow trying to add the 2 columns together, but I'm running into all sorts of issues. The code below sort of works, but my original 'Qty' column is returning a bunch of NaNs instead of the original numbers for rows where there is no delimiter character
df['Qty'] = df['Qty'].fillna(value=np.nan)
df[['Qty','Fraction']] = df['Qty'].str.split(' ', expand=True)
Here's my original ['Qty'] column:
Here's the same rows after running that split code on it:
Intertingly, it does properly split the rows with integer-fraction mix, but turning certain rows to NaN for reasons I don't understand is throwing me off. Another thing I've tried is using lambda functions, but from what I can gather, those work best when it's just a traditional fraction like 3/8, without an integer in front of it. Been researching for hours and I'm close to giving up so if anyone has a clue how to go about this, I'd love to know
Thanks
CodePudding user response:
Here is one approach using fractions.Fraction
:
from fractions import Fraction
df2 = df['Qty'].str.extract(r'(\d (?:\.\d )?)?\s*(\d /\d )?')
out = (pd.to_numeric(df2[0], errors='coerce')
df2[1].fillna(0).apply(lambda x: float(Fraction(x)))
)
df['float'] = out
df['int'] = out.round().astype(int)
output:
Qty float int
0 250.25 250.25000 250
1 32.75 32.75000 33
2 64 64.00000 64
3 50 17/32 50.53125 51
4 16 3/8 16.37500 16
Alternative using arithmetic:
df2 = df['Qty'].str.extract(r'(\d (?:\.\d )?)?\s*(?:(\d )/(\d ))?').astype(float)
df['int'] = (df2[0] df2[1].fillna(0)/df2[2].fillna(1)).round().astype(int)