I've been trying to sort the following based on the order it currently is in within the Dataframe. I even created an 'Indexing' column to try to preserve the order, however when I do a rank for the 'Level 2' column it defaults to ordering it by alphabetical order as opposed to top-to-bottom. I tried using a groupby.rank as well and had no luck.
The Table
--------------------------- ----------- --------------------- ------------- --------------------- ---------------- --------------------------- ----------
| Account Name | Hierarchy | Parent Account | Level 1 | Level 2 | Level 3 | Level 4 | Indexing |
--------------------------- ----------- --------------------- ------------- --------------------- ---------------- --------------------------- ----------
| Assets | 1 | NaN | Assets | NaN | NaN | NaN | 1 |
| Current Assets | 2 | Assets | Assets | Current Assets | NaN | NaN | 2 |
| Bank Accounts | 3 | Current Assets | Assets | Current Assets | Bank Accounts | NaN | 3 |
| ANZ Bank | 4 | Bank Accounts | Assets | Current Assets | Bank Accounts | ANZ Bank | 4 |
| Credit Card | 4 | Bank Accounts | Assets | Current Assets | Bank Accounts | Credit Card | 5 |
| Prepaid Income | 3 | Current Assets | Assets | Current Assets | Prepaid Income | NaN | 6 |
| Non-Current Assets | 2 | Assets | Assets | Non-Current Assets | NaN | NaN | 7 |
| PPE | 3 | Current Assets | Assets | Non-Current Assets | PPE | NaN | 8 |
| Liabilities | 1 | Current Assets | Liabilities | NaN | NaN | NaN | 9 |
| Current Liabilities | 2 | Liabilities | Liabilities | Current Liabilities | NaN | NaN | 10 |
| Bank Overdraft | 3 | Current Liabilities | Liabilities | Current Liabilities | Bank Overdraft | NaN | 11 |
| Provisions | 3 | Current Liabilities | Liabilities | Current Liabilities | Provisions | NaN | 12 |
| Provisions - LSL Leave | 4 | Provisions | Liabilities | Current Liabilities | Provisions | Provisions - LSL Leave | 13 |
| Provisions - Annual Leave | 4 | Provisions | Liabilities | Current Liabilities | Provisions | Provisions - Annual Leave | 14 |
--------------------------- ----------- --------------------- ------------- --------------------- ---------------- --------------------------- ----------
--
The Code
df['Level 2 Sort'] = df['Level 2'].rank(method='dense')
The Result
--------------------- --------------
| Level 2 | Level 2 Sort |
--------------------- --------------
| NaN | NaN |
| Current Assets | 1 |
| Current Assets | 1 |
| Current Assets | 1 |
| Current Assets | 1 |
| Current Assets | 1 |
| Non-Current Assets | 3 |
| Non-Current Assets | 3 |
| NaN | NaN |
| Current Liabilities | 2 |
| Current Liabilities | 2 |
| Current Liabilities | 2 |
| Current Liabilities | 2 |
| Current Liabilities | 2 |
--------------------- --------------
The Desired Result
--------------------- --------------
| Level 2 | Level 2 Sort |
--------------------- --------------
| NaN | NaN |
| Current Assets | 1 |
| Current Assets | 1 |
| Current Assets | 1 |
| Current Assets | 1 |
| Current Assets | 1 |
| Non-Current Assets | 2 |
| Non-Current Assets | 2 |
| NaN | NaN |
| Current Liabilities | 3 |
| Current Liabilities | 3 |
| Current Liabilities | 3 |
| Current Liabilities | 3 |
| Current Liabilities | 3 |
--------------------- --------------
CodePudding user response:
A bunch of ways you can do this. This one allows replacement of 0 with nan by chaining.
import numpy as np
(df.groupby('Level 2', sort=False).ngroup() 1).replace(0,np.nan)
CodePudding user response:
For top-to-bottom, you can use factorize
:
df['Level 2 Sort'] = df['Level 2'].factorize()[0] 1
Output (0
indicates NaN
value in Level 2
column):
Level 2 Level 2 Sort
0 NaN 0
1 Current Assets 1
2 Current Assets 1
3 Current Assets 1
4 Current Assets 1
5 Current Assets 1
6 Non-Current Assets 2
7 Non-Current Assets 2
8 NaN 0
9 Current Liabilities 3
10 Current Liabilities 3
11 Current Liabilities 3
12 Current Liabilities 3
13 Current Liabilities 3