I have a sample dataframe like this. Basically I want to do ranking based on item_number and location_id. I could've done something like that in SQL using window function (dense_rank, over() partition by).
df = pd.DataFrame({'item_number': [1029980, 1029980, 1029980, 1029980, 1029980,
1029980, 1029980, 1029980, 1029980, 1029980],
'location_id': ['L3-25-AA-05-B', 'L3-25-AA-05-B', 'L3-25-AA-05-B', 'L3-25-AA-05-B', 'L3-25-AA-05-B',
'L4-25-AA-05-B', 'L4-25-AA-05-B','L4-25-AA-05-B', 'L4-25-AA-05-B', 'L4-25-AA-05-B'],
'Date': ['2021-10-01', '2021-10-02', '2021-10-03', '2021-10-04', '2021-10-05',
'2021-10-01', '2021-10-02', '2021-10-03', '2021-10-04', '2021-10-05']})
item_number | location_id | Date |
---|---|---|
1029980 | L3-25-AA-05-B | 2021-10-01 |
1029980 | L3-25-AA-05-B | 2021-10-02 |
1029980 | L3-25-AA-05-B | 2021-10-03 |
1029980 | L3-25-AA-05-B | 2021-10-04 |
1029980 | L3-25-AA-05-B | 2021-10-05 |
1029980 | L4-25-AA-05-B | 2021-10-01 |
1029980 | L4-25-AA-05-B | 2021-10-02 |
1029980 | L4-25-AA-05-B | 2021-10-03 |
1029980 | L4-25-AA-05-B | 2021-10-04 |
1029980 | L4-25-AA-05-B | 2021-10-05 |
I want the data to be like this. The ranking is grouped by item_number and location_id. If the item_number and location_id are same then it's considered in a same group and ranking should be done based on the Date.
item_number | location_id | Date | Rank |
---|---|---|---|
1029980 | L3-25-AA-05-B | 2021-10-01 | 5 |
1029980 | L3-25-AA-05-B | 2021-10-02 | 4 |
1029980 | L3-25-AA-05-B | 2021-10-03 | 3 |
1029980 | L3-25-AA-05-B | 2021-10-04 | 2 |
1029980 | L3-25-AA-05-B | 2021-10-05 | 1 |
1029980 | L4-25-AA-05-B | 2021-10-01 | 5 |
1029980 | L4-25-AA-05-B | 2021-10-02 | 4 |
1029980 | L4-25-AA-05-B | 2021-10-03 | 3 |
1029980 | L4-25-AA-05-B | 2021-10-04 | 2 |
1029980 | L4-25-AA-05-B | 2021-10-05 | 1 |
I have tried this code but it's giving an error as the columns are all string.
test['rank'] = test.groupby(['item_number','location_id']).rank()
The above code gave me this error.
DataError: No numeric types to aggregate
Can anyone help me in this regard please?
CodePudding user response:
In your case do
df['new'] = df.groupby(['item_number','location_id'])['Date'].rank(ascending=False)
0 5.0
1 4.0
2 3.0
3 2.0
4 1.0
5 5.0
6 4.0
7 3.0
8 2.0
9 1.0
Name: Date, dtype: float64
CodePudding user response:
IIUC, you can reverse the dataframe, groupby
on item_number
and location_id
and cumcount
Date
s:
df['rank'] = df.groupby(['item_number','location_id'], as_index=False)['Date'].cumcount(ascending=False) 1
Output:
item_number location_id Date rank
0 1029980 L3-25-AA-05-B 2021-10-01 5
1 1029980 L3-25-AA-05-B 2021-10-02 4
2 1029980 L3-25-AA-05-B 2021-10-03 3
3 1029980 L3-25-AA-05-B 2021-10-04 2
4 1029980 L3-25-AA-05-B 2021-10-05 1
5 1029980 L4-25-AA-05-B 2021-10-01 5
6 1029980 L4-25-AA-05-B 2021-10-02 4
7 1029980 L4-25-AA-05-B 2021-10-03 3
8 1029980 L4-25-AA-05-B 2021-10-04 2
9 1029980 L4-25-AA-05-B 2021-10-05 1
CodePudding user response:
You could use:
test.dtypes
to see what type the columns are that you are using (if they are numeric of not) and then potentially use astype: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html
test.astype({"item_number": "int"}).groupby(['item_number','location_id']).rank()
Though I'm not sure if that would work for location_id.