Home > Enterprise >  Pandas sorting by similar column value
Pandas sorting by similar column value

Time:03-29

I'm new to pandas and I'm trying to sort a dataframe that I used melt on.

Here's what melt looks like:

              id  date_time      sold
0     sample_id1   Monday 0         0
1     sample_id2   Monday 0         0
2     sample_id3   Monday 0         0
3     sample_id4   Monday 0         0
4     sample_id5   Monday 0         0
...          ...        ...       ...
7387  sample_id1  Sunday 23         0
7388  sample_id2  Sunday 23         0
7389  sample_id3  Sunday 23         1
7390  sample_id4  Sunday 23         0
7391  sample_id5  Sunday 23         0

What my desired output would look like:

              id  date_time      sold
0     sample_id1   Monday 0         0
1     sample_id1   Monday 1         0
2     sample_id1   Monday 2         0
3     sample_id1   Monday 3         0
4     sample_id1   Monday 4         0
5     sample_id1   Monday 5         0
6     sample_id1   Monday 6         0
7     sample_id1   Monday 7         0
8     sample_id1   Monday 8         0
9     sample_id1   Monday 9         0
...          ...        ...       ...
7387  sample_id5  Sunday 19         0
7388  sample_id5  Sunday 20         0
7389  sample_id5  Sunday 21         1
7390  sample_id5  Sunday 22         0
7391  sample_id5  Sunday 23         0

I tried using sort_values(['id']), but the output mixes the date_time column - looking like this:

              id   date_time     sold
0     sample_id1   Sunday 23        0
1     sample_id1   Saturday 18      0
2     sample_id1   Thursday 1       0
3     sample_id1   Monday 16        36
4     sample_id1   Saturday 19      6

What function from pandas do I need to use to achieve what I want?

CodePudding user response:

Use parameter key in DataFrame.sort_values for convert values by dictionary:

d = {'Monday': '0', 'Tuesday': '1', 'Wednesday': '2', 
     'Thursday': '3', 'Friday': '4', 'Saturday': '5', 'Sunday': '6'}

def f(x):
    try:
        return x.replace(d, regex=True).str.split().apply(lambda x: tuple(map(int, x)))
    except:
        return x

df = df.sort_values(['id', 'date_time'], key=f)

CodePudding user response:

Not sure about your date_time datatype, but you can always do this to sort based on two columns

sort_values(['id', 'date_time'])

CodePudding user response:

You could try creating two new columns based on your datetime column. The first would represent the day of the week with Monday being 1 and Sunday being 7, and the second would just be the number after the space. Then do a sort based on id and these two new columns.

  • Related