I have a DataFrame with 1k records similar to the below one. Column B is a group column in sorted order.
import pandas as pd
df = pd.DataFrame([['red', 0], ['green', 0], ['blue', 16],['white', 58],['yellow', 59], ['purple', 71], ['violet', 82],['grey', 82]], columns=['A','B'])
df
A B
0 red 0
1 green 0
2 blue 16
3 white 58
4 yellow 59
5 purple 71
6 violet 82
7 grey 82
How could I update column B to get an output like the below one (to use the column as a category later)?
output_df = pd.DataFrame([['red', 'group1'], ['green', 'group1'], ['blue', 'group2'],['white', 'group3'],['yellow', 'group4'], ['purple', 'group5'], ['violet', 'group6'],['grey', 'group6']], columns=['A','B'])
output_df
A B
0 red group1
1 green group1
2 blue group2
3 white group3
4 yellow group4
5 purple group5
6 violet group6
7 grey group6
CodePudding user response:
You can use .ne()
, .shift()
, and .cumsum()
to get the group numbers, and then use string manipulation to get the desired result:
df["B"] = "group" df["B"].ne(df["B"].shift(1)).cumsum().astype(str)
This outputs:
A B
0 red group1
1 green group1
2 blue group2
3 white group3
4 yellow group4
5 purple group5
6 violet group6
7 grey group6