I have a pandas df that looks like the following:
--------- --------- ------------ --------
| Cluster | Country | Publishers | Assets |
--------- --------- ------------ --------
| South | IT | SS | Asset1 |
| South | IT | SS | Asset2 |
| South | IT | SS | Asset3 |
| South | IT | ML | Asset1 |
| South | IT | ML | Asset2 |
| South | IT | ML | Asset3 |
| South | IT | TT | Asset1 |
| South | IT | TT | Asset2 |
| South | IT | TT | Asset3 |
| South | ES | SS | Asset1 |
| South | ES | SS | Asset2 |
--------- --------- ------------ --------
I would like to create a new column "Package" that uses a cumulative count based on the following columns:
- Publishers
- Assets
The result would be this:
--------- --------- ------------ -------- ---------
| Cluster | Country | Publishers | Assets | Package |
--------- --------- ------------ -------- ---------
| South | IT | SS | Asset1 | 1 |
| South | IT | SS | Asset2 | 1a |
| South | IT | SS | Asset3 | 1b |
| South | IT | ML | Asset1 | 2 |
| South | IT | ML | Asset2 | 2a |
| South | IT | ML | Asset3 | 2b |
| South | IT | TT | Asset1 | 3 |
| South | IT | TT | Asset2 | 3a |
| South | IT | TT | Asset3 | 3b |
| South | ES | SS | Asset1 | 4 |
| South | ES | SS | Asset2 | 4a |
--------- --------- ------------ -------- ---------
So far I have tried
df['Package'] = df.groupby(['Cluster','Publishers']).cumcount() but it seems not to work as the value resets to 0 after every publisher instance is gone through.
CodePudding user response:
You can use groupby.cumcount
, but with a different grouper. You will also need the related groupby.ngroup
:
from string import ascii_lowercase
# group by consecutive identical values
group = df['Publishers'].ne(df['Publishers'].shift()).cumsum()
# alternatively, you can also group by Cluster/Country/Publishers
# group = ['Cluster', 'Country', 'Publisher']
df['Package'] =(
df.groupby(group).ngroup().add(1).astype(str)
df.groupby(group).cumcount().map(dict(enumerate([''] list(ascii_lowercase))))
)
output:
Cluster Country Publishers Assets Package
0 South IT SS Asset1 1
1 South IT SS Asset2 1a
2 South IT SS Asset3 1b
3 South IT ML Asset1 2
4 South IT ML Asset2 2a
5 South IT ML Asset3 2b
6 South IT TT Asset1 3
7 South IT TT Asset2 3a
8 South IT TT Asset3 3b
9 South ES SS Asset1 4
10 South ES SS Asset2 4a