Home > Software engineering >  move Pandas row to end based on condition
move Pandas row to end based on condition

Time:06-29

I have a pandas dataframe with several columns. That dataframe is sorted based on values in one of the columns. However there are some rows which need to go to the bottom based on a different condition. The column where this second condition applies looks something like this:

Superheros
Spiderman (Nr 1)
Batman (Nr 4)
Joker
Iron Man (Nr 2)
Hulk
Captain America
Wonderwoman (Nr 3)

I need all rows containing (Nr #) to move to the bottom in a sorted way, leaving the order of the other rows untouched, so that it results in this:

Superheros
Joker
Hulk
Captain America
Spiderman (Nr 1)
Iron Man (Nr 2)
Wonderwoman (Nr 3)
Batman (Nr 4)

I found this question about moving pandas rows however, I don't think I can use this, or at least I would not know how to change it so that it does what I want.

I was thinking of creating a new column that contains the numbers in the brackets, and 0 for every hero that doesn't have a number, sort, split dataframe and add them together again, but this seems like a detour.

I was also thinking of finding the Nr 1,2,3,... one by one and moving them to the end, like this:

index = df["Superheros"].where(df["Superheros"].str.contains("Nr 1"))
df["Superheros"][index].move_to[-1]

but I can't seem to find this "move_to" command - so not sure if it might not exist? I am sure that I am missing something very obviouse...

CodePudding user response:

You can extract the number, sort_values with na_position='first' parameter, and use this to reindex the original DataFrame:

s = pd.to_numeric(df['Superheros'].str.extract('\(Nr (\d )\)', expand=False))

idx = s.sort_values(na_position='first').index

out = df.loc[idx]

Or, same logic using the key parameter of sort_values:

sorter = lambda s: pd.to_numeric(s.str.extract('\(Nr (\d )\)', expand=False))

out = df.sort_values(by='Superheros', key=sorter, na_position='first')

output:

           Superheros
2               Joker
4                Hulk
5     Captain America
0    Spiderman (Nr 1)
3     Iron Man (Nr 2)
6  Wonderwoman (Nr 3)
1       Batman (Nr 4)

Intermediate:

sorter(df['Superheros'])

0    1.0
1    4.0
2    NaN
3    2.0
4    NaN
5    NaN
6    3.0
Name: Superheros, dtype: float64
  • Related