Home > Blockchain >  python how to use string value for custom sort?
python how to use string value for custom sort?

Time:05-06

I have an datafremae like this

   time_posted
0   5 days ago
1  an hour ago
2    a day ago
3  6 hours ago
4  4 hours ago

I tried this df.sort_values(by='time_posted',ascending=True) and getting this result:

   time_posted
4  4 hours ago
0   5 days ago
3  6 hours ago
2    a day ago
1  an hour ago

But I want to short value by hours ago so my datframe will be look like this

   time_posted
1  an hour ago
4  4 hours ago
3  6 hours ago
2    a day ago
0   5 days ago

CodePudding user response:

If you remove the "ago" and replace "a/an" by 1, you can feed the values to pandas.to_timedelta:

(pd.to_timedelta(df['time_posted']
.str.replace(r'\ban?\b', '1', regex=True)
.str.replace(' ago', '', regex=False))
)

output:

0   5 days 00:00:00
1   0 days 01:00:00
2   1 days 00:00:00
3   0 days 06:00:00
4   0 days 04:00:00
Name: time_posted, dtype: timedelta64[ns]

This enables you to get a sorted order:

idx = (pd.to_timedelta(df['time_posted']
 .str.replace(r'\ban?\b', '1', regex=True)
 .str.replace(' ago', '', regex=False))
 .sort_values()
 .index
)

df.loc[idx]

output:

   time_posted
1  an hour ago
4  4 hours ago
3  6 hours ago
2    a day ago
0   5 days ago

CodePudding user response:

One answer could be the following

set example data

import pandas as pd

#your dataframe
df = pd.DataFrame(dict(time_posted=['5 days ago', 'an hour ago', 'a day ago', '6 hours ago', '4 hours ago']))

Conversion func

You have to split the string and take a decision on the different values (here x[0] a value and x[1] a unit)

def to_hours(s):
    x = s.split(' ')

    if x[0].lower() in ['a','an']:
        a = 1
    else:
        a = float(x[0])

    x1 = x[1].lower()
    b = 1 # 1 hour
    if x[1].startswith('day'):
        b = b*24 # 1 day = 24 hours
    
    return a*b

Application

df['hours'] = df.time_posted.apply(to_hours) # apply hours conversion
df = df.sort_values('hours',ascending=True)[['time_posted']]# Sort and skip non-necessary col

print(df)

output:

time_posted
1  an hour ago
4  4 hours ago
3  6 hours ago
2    a day ago
0   5 days ago
  • Related