Home > Enterprise >  Pandas Rank for Text Column not by alphabetical order
Pandas Rank for Text Column not by alphabetical order

Time:08-30

I've been trying to sort the following based on the order it currently is in within the Dataframe. I even created an 'Indexing' column to try to preserve the order, however when I do a rank for the 'Level 2' column it defaults to ordering it by alphabetical order as opposed to top-to-bottom. I tried using a groupby.rank as well and had no luck.

The Table

 --------------------------- ----------- --------------------- ------------- --------------------- ---------------- --------------------------- ---------- 
|       Account Name        | Hierarchy |   Parent Account    |   Level 1   |       Level 2       |    Level 3     |          Level 4          | Indexing |
 --------------------------- ----------- --------------------- ------------- --------------------- ---------------- --------------------------- ---------- 
| Assets                    |         1 | NaN                 | Assets      | NaN                 | NaN            | NaN                       |        1 |
| Current Assets            |         2 | Assets              | Assets      | Current Assets      | NaN            | NaN                       |        2 |
| Bank Accounts             |         3 | Current Assets      | Assets      | Current Assets      | Bank Accounts  | NaN                       |        3 |
| ANZ Bank                  |         4 | Bank Accounts       | Assets      | Current Assets      | Bank Accounts  | ANZ Bank                  |        4 |
| Credit Card               |         4 | Bank Accounts       | Assets      | Current Assets      | Bank Accounts  | Credit Card               |        5 |
| Prepaid Income            |         3 | Current Assets      | Assets      | Current Assets      | Prepaid Income | NaN                       |        6 |
| Non-Current Assets        |         2 | Assets              | Assets      | Non-Current Assets  | NaN            | NaN                       |        7 |
| PPE                       |         3 | Current Assets      | Assets      | Non-Current Assets  | PPE            | NaN                       |        8 |
| Liabilities               |         1 | Current Assets      | Liabilities | NaN                 | NaN            | NaN                       |        9 |
| Current Liabilities       |         2 | Liabilities         | Liabilities | Current Liabilities | NaN            | NaN                       |       10 |
| Bank Overdraft            |         3 | Current Liabilities | Liabilities | Current Liabilities | Bank Overdraft | NaN                       |       11 |
| Provisions                |         3 | Current Liabilities | Liabilities | Current Liabilities | Provisions     | NaN                       |       12 |
| Provisions - LSL Leave    |         4 | Provisions          | Liabilities | Current Liabilities | Provisions     | Provisions - LSL Leave    |       13 |
| Provisions - Annual Leave |         4 | Provisions          | Liabilities | Current Liabilities | Provisions     | Provisions - Annual Leave |       14 |
 --------------------------- ----------- --------------------- ------------- --------------------- ---------------- --------------------------- ---------- 

-- 

The Code

df['Level 2 Sort'] = df['Level 2'].rank(method='dense')

The Result

 --------------------- -------------- 
|       Level 2       | Level 2 Sort |
 --------------------- -------------- 
| NaN                 | NaN          |
| Current Assets      | 1            |
| Current Assets      | 1            |
| Current Assets      | 1            |
| Current Assets      | 1            |
| Current Assets      | 1            |
| Non-Current Assets  | 3            |
| Non-Current Assets  | 3            |
| NaN                 | NaN          |
| Current Liabilities | 2            |
| Current Liabilities | 2            |
| Current Liabilities | 2            |
| Current Liabilities | 2            |
| Current Liabilities | 2            |
 --------------------- -------------- 

The Desired Result

 --------------------- -------------- 
|       Level 2       | Level 2 Sort |
 --------------------- -------------- 
| NaN                 | NaN          |
| Current Assets      | 1            |
| Current Assets      | 1            |
| Current Assets      | 1            |
| Current Assets      | 1            |
| Current Assets      | 1            |
| Non-Current Assets  | 2            |
| Non-Current Assets  | 2            |
| NaN                 | NaN          |
| Current Liabilities | 3            |
| Current Liabilities | 3            |
| Current Liabilities | 3            |
| Current Liabilities | 3            |
| Current Liabilities | 3            |
 --------------------- -------------- 

CodePudding user response:

A bunch of ways you can do this. This one allows replacement of 0 with nan by chaining.

import numpy as np
(df.groupby('Level 2', sort=False).ngroup() 1).replace(0,np.nan)

CodePudding user response:

For top-to-bottom, you can use factorize:

df['Level 2 Sort'] = df['Level 2'].factorize()[0]   1

Output (0 indicates NaN value in Level 2 column):

                Level 2  Level 2 Sort
0                   NaN             0
1        Current Assets             1
2        Current Assets             1
3        Current Assets             1
4        Current Assets             1
5        Current Assets             1
6    Non-Current Assets             2
7    Non-Current Assets             2
8                   NaN             0
9   Current Liabilities             3
10  Current Liabilities             3
11  Current Liabilities             3
12  Current Liabilities             3
13  Current Liabilities             3
  • Related