How do I convert the values of Years_in_service
to its corresponding decimal/float values ?
For example, '5 year(s), 7 month(s), 3 day(s)'
has a decimal value of 5.59
import pandas as pd
import numpy as np
data = {'ID':['A1001','A5001','B1001','D5115','K4910'],
'Years_in_service': ['5 year(s), 7 month(s), 3 day(s)', '16 year(s), 0 month(s), 25 day(s)',
'7 year(s), 0 month(s), 2 day(s)', '0 year(s), 11 month(s), 23 day(s)','1 year(s), 0 month(s), 6 day(s)'],
'Age': [45, 59,21,18,35]}
df = pd.DataFrame(data)
df
Currently I'm able to extract only the year (See my attempt below)
df['Years_in_service'].str[:2].astype(float)
Please show your full code, Thanks for your attempt.
CodePudding user response:
Here's a way to do:
def convert_dates(y,m,d):
return round(int(y) int(m)/12 int(d)/365.25, 2)
df['date_float'] = df['Years_in_service'].apply(lambda x: convert_dates(*[int(i) for i in x.split(' ') if i.isnumeric()]))
print(df)
ID Years_in_service Age date_float
0 A1001 5 year(s), 7 month(s), 3 day(s) 45 5.59
1 A5001 16 year(s), 0 month(s), 25 day(s) 59 16.07
2 B1001 7 year(s), 0 month(s), 2 day(s) 21 7.01
3 D5115 0 year(s), 11 month(s), 23 day(s) 18 0.98
4 K4910 1 year(s), 0 month(s), 6 day(s) 35 1.02
Note:
*[int(i) for i in x.split(' ') if i.isnumeric()]
<- This expression unpacks the list and passes the numbers as argument to the convert_dates
function.
CodePudding user response:
How about this?
After:
import pandas as pd
import numpy as np
data = {'ID':['A1001','A5001','B1001','D5115','K4910'],
'Years_in_service': ['5 year(s), 7 month(s), 3 day(s)', '16 year(s), 0 month(s), 25 day(s)',
'7 year(s), 0 month(s), 2 day(s)', '0 year(s), 11 month(s), 23 day(s)','1 year(s), 0 month(s), 6 day(s)'],
'Age': [45, 59,21,18,35]}
df = pd.DataFrame(data)
Do this:
returnlist = []
for each in df['Years_in_service']:
years, months, days = [float(i.strip().split(' ')[0]) for i in each.split(',')]
returnlist.append(years months/12 days/365.25)
for each in returnlist:
print (f'Years in service: {each:.2f}')
# Result:
# Years in service: 5.59
# Years in service: 16.07
# Years in service: 7.01
# Years in service: 0.98
# Years in service: 1.02
You could make it more compact (but less readable) like this. I don't think there's a computational upside, but here's the idea anyway:
for each in df['Years_in_service']:
returnlist.append(np.sum(np.array([1, 1/12, 1/365.25])*np.array([float(i.strip().split(' ')[0]) for i in each.split(',')])))
CodePudding user response:
If you don't care about the years/month precision, and the Year/Month/Day are always present and in this order, you can extractall
the 3 numbers, divide
by the average conversion factor and sum
:
df['Total'] = (pd.to_numeric(df['Years_in_service'].str.extractall('(\d )')[0])
.unstack().div([1, 12, 365.25]).sum(axis=1)
.round(2) # optional
)
output:
ID Years_in_service Age Total
0 A1001 5 year(s), 7 month(s), 3 day(s) 45 5.59
1 A5001 16 year(s), 0 month(s), 25 day(s) 59 16.07
2 B1001 7 year(s), 0 month(s), 2 day(s) 21 7.01
3 D5115 0 year(s), 11 month(s), 23 day(s) 18 0.98
4 K4910 1 year(s), 0 month(s), 6 day(s) 35 1.02
CodePudding user response:
i=1
for name in ['month','day']:
df[name] = [date.split(',')[i].split(' ')[1] for date in df['Years_in_service']]
df[name]=df[name].astype('float64')
i =1
df['years']=[date.split(',')[0].split(' ')[0] for date in df['Years_in_service']]
df['years'] = df['years'].astype('float64')
i=12
for name in ['month','day']:
df[name] = [x/i for x in df[name]]
i=365.25
l=[]
for i in range(len(df.index)):
l.append(round((df.iloc[i,3] df.iloc[i,4] df.iloc[i,5]),2))
df['decimal_date'] =l