Home > Back-end >  convert into number of years based on experience in string using python
convert into number of years based on experience in string using python

Time:11-08

I have a column with data like 9 years 3 months 22 days. How to convert it into =9 (3/12) (22/365) =9.31

I want output as 9.31.

How to do the same in python? Please anyone can help me out?

CodePudding user response:

If leaps years are not important extract values by Series.str.extract and count new column by multiple with dictionary and sum values:

df = pd.DataFrame({'col': ['9 years 3 months 22 days',
                           '3 YEARS 6 MONTHS', 
                           '4 YEARS', 
                           '3 YEARS',
                            '3.6']})

numeric = pd.to_numeric(df['col'], errors='coerce')
print (numeric)
y = df['col'].str.extract(r'(?i)(\d )\s*year', expand=False).astype(float)
m = df['col'].str.extract(r'(?i)(\d )\s*month', expand=False).astype(float).div(12)
d = df['col'].str.extract(r'(?i)(\d )\s*day', expand=False).astype(float).div(365)

df['float'] = y.add(m, fill_value=0).add(d, fill_value=0).fillna(numeric)
print (df)
                        col     float
0  9 years 3 months 22 days  9.310274
1          3 YEARS 6 MONTHS  3.500000
2                   4 YEARS  4.000000
3                   3 YEARS  3.000000
4                       3.6  3.600000

CodePudding user response:

If you want to update the same column then you can use

df = pd.DataFrame({'time': ['9 years 3 months 22 days',
                           '9 years 3 months 21 days',
                           '9 years 3 months 20 days']})
for idx in range(len(df)):
    if df.time[idx] is np.NaN:
        continue
    # Extract values from string
    val = df.time[idx].lower()
    if ' ' in val:
        val = val.split()
        t = {val[i   1]: int(val[i]) for i in range(0, len(val), 2)}
        # calculate float value upto two decimal places
        float_val = "{:.2f}".format(t.get("years", 0)   (t.get("months", 0)/12)   (t.get("days", 0)/365))
        # update table
        df.time[idx] = float_val
   
  • Related