I have an datafremae like this
time_posted
0 5 days ago
1 an hour ago
2 a day ago
3 6 hours ago
4 4 hours ago
I tried this df.sort_values(by='time_posted',ascending=True)
and getting this result:
time_posted
4 4 hours ago
0 5 days ago
3 6 hours ago
2 a day ago
1 an hour ago
But I want to short value by hours ago
so my datframe will be look like this
time_posted
1 an hour ago
4 4 hours ago
3 6 hours ago
2 a day ago
0 5 days ago
CodePudding user response:
If you remove the "ago" and replace "a/an" by 1, you can feed the values to pandas.to_timedelta
:
(pd.to_timedelta(df['time_posted']
.str.replace(r'\ban?\b', '1', regex=True)
.str.replace(' ago', '', regex=False))
)
output:
0 5 days 00:00:00
1 0 days 01:00:00
2 1 days 00:00:00
3 0 days 06:00:00
4 0 days 04:00:00
Name: time_posted, dtype: timedelta64[ns]
This enables you to get a sorted order:
idx = (pd.to_timedelta(df['time_posted']
.str.replace(r'\ban?\b', '1', regex=True)
.str.replace(' ago', '', regex=False))
.sort_values()
.index
)
df.loc[idx]
output:
time_posted
1 an hour ago
4 4 hours ago
3 6 hours ago
2 a day ago
0 5 days ago
CodePudding user response:
One answer could be the following
set example data
import pandas as pd
#your dataframe
df = pd.DataFrame(dict(time_posted=['5 days ago', 'an hour ago', 'a day ago', '6 hours ago', '4 hours ago']))
Conversion func
You have to split the string and take a decision on the different values (here x[0]
a value and x[1]
a unit)
def to_hours(s):
x = s.split(' ')
if x[0].lower() in ['a','an']:
a = 1
else:
a = float(x[0])
x1 = x[1].lower()
b = 1 # 1 hour
if x[1].startswith('day'):
b = b*24 # 1 day = 24 hours
return a*b
Application
df['hours'] = df.time_posted.apply(to_hours) # apply hours conversion
df = df.sort_values('hours',ascending=True)[['time_posted']]# Sort and skip non-necessary col
print(df)
output:
time_posted
1 an hour ago
4 4 hours ago
3 6 hours ago
2 a day ago
0 5 days ago