Home > Back-end >  How to conditionally rearrange row order
How to conditionally rearrange row order

Time:12-11

I have a dataset that has 'unknown' in each group. I am thinking to put 'unknown' at the end of each group. For example: I have a code like this:

import pandas as pd
clients1 = {'Name': ['A','B','Unknown','Y','Z','A','B','Unknown','Y','Z'],
        'group': ['1','1','1','1','1','2','2','2','2','2']
       }
df = pd.DataFrame(clients1, columns= ['Name', 'group'])
df

the output:

   Name group
    A   1
    B   1
    Unknown 1
    Y   1
    Z   1
    A   2
    B   2
    Unknown 2
    Y   2
    Z   2

the desired output should be:

   Name group
    A   1
    B   1
    Y   1
    Z   1
    Unknown 1
    A   2
    B   2
    Y   2
    Z   2
    Unknown 2

Does anyone know how to do that?

CodePudding user response:

You can sort the data by group and name==Unknown, so the Unknown's are marked with True and put at the bottom:

df = (df.assign(unknown_name=df['Name'].eq('Unknown'))
        .sort_values(['group','unknown_name'])
     )

Output:

      Name group  unknown_name
0        A     1         False
1        B     1         False
3        Y     1         False
4        Z     1         False
2  Unknown     1          True
5        A     2         False
6        B     2         False
8        Y     2         False
9        Z     2         False
7  Unknown     2          True

CodePudding user response:

one way you might want to consider is to replace 'unknown' with 'NaN':

df["Name"].replace('Unknown', np.NaN,inplace=True)
df.sort_values(by=['group','Name'], inplace=True)

>>> output 
Name group
0    A     1
1    B     1
3    Y     1
4    Z     1
2  NaN     1
5    A     2
6    B     2
8    Y     2
9    Z     2
7  NaN     2

and ofcource you can replace back the NaNs with unknown again

CodePudding user response:

Mind the right order of your items:

import pandas as pd
clients1 = {'Name': ['A','B','Y','Z', 'Unknown','A','B','Y','Z', 'Unknown'],
        'group': ['1','1','1','1','1','2','2','2','2','2']
       }
df = pd.DataFrame(clients1, columns= ['Name', 'group'])
df

CodePudding user response:

You can use category dtype like this:

l = df['Name'].unique().tolist()
u = l.pop(l.index('Unknown'))
l.append(u)

n_dtype = pd.CategoricalDtype(l, ordered=True)
df['Name'] = df['Name'].astype(n_dtype)

df.sort_values(['group', 'Name'])

Output:

      Name group
0        A     1
1        B     1
3        Y     1
4        Z     1
2  Unknown     1
5        A     2
6        B     2
8        Y     2
9        Z     2
7  Unknown     2

CodePudding user response:

Without doing any sorting and extra column. It will preserve the original order of the group field and so should be more generic. The logic is similar to manipulating the string. For each group, something like group[:Unknown] group[Unknown 1:] Unknown

def func(x):
  result=pd.DataFrame()
  idx = np.flatnonzero(x['Name']=='Unknown')  # get "boolean indices" for using iloc
  prev = None
  for i in idx:
    result = result.append(x.iloc[prev:i,:])
    prev = i 1
  result = result.append(x.iloc[prev:,:])  # last block after Unknown
  for i in idx:
    result = result.append(x.iloc[i, :])  # for appending Unknown records
  return result

df.groupby('group').apply(func).droplevel(0)
   Name group
0   A   1
1   B   1
3   Y   1
4   Z   1
2   Unknown 1
5   A   2
6   B   2
8   Y   2
9   Z   2
7   Unknown 2
  • Related