I have a dataset, df, where I would like to combine the column name with each column value and display the label count.
For example, for id 'aa' in 2022 Q1, there is 1 'hi'
for id 'aa' in 2022 Q2, there are 2 'hi' 's
Data
id type date Q1 Q2
aa hi 2022 1 2
aa hi 2023 1 1
aa ok 2022 1 0
bb hi 2024 3 0
Desired
id type date count
aa hi Q1 2022 hi01
aa ok Q1 2022 ok01
aa hi Q2 2022 hi01
aa hi Q2 2022 hi02
aa hi Q1 2023 hi01
aa hi Q2 2023 hi01
bb hi Q1 2024 hi01
bb hi Q1 2024 hi02
bb hi Q1 2024 hi03
Doing
My approach is to break this apart in steps. I believe I have to perform a pivot, join and a cumcount:
#create a pivot
df.set_index(['id', 'type']).stack().reset_index()
#set the count
df['count'] = df['type'] df.groupby([*df]).cumcount().add(1).astype(str).str.zfill(2)
Any suggestion is appreciated
CodePudding user response:
You could use the following:
(df.melt(id_vars=['id', 'type', 'date'], value_name='count') # reshape data
.sort_values(by=['date', 'variable'])
# duplicate the rows according to counts
.loc[lambda d: d.index.repeat(d['count'])]
.reset_index(drop=True)
# merge the quarters and years
.assign(date=lambda d: d['variable'] ' ' d['date'].astype(str),
# increment the type per group
count=lambda d: d['type'] d.groupby(['id', 'date', 'type']).cumcount().add(1).astype(str).str.zfill(2)
)
# drop now unused column
.drop(columns='variable')
)
output:
id type date count
0 aa hi Q1 2022 hi01
1 aa ok Q1 2022 ok01
2 aa hi Q2 2022 hi01
3 aa hi Q2 2022 hi02
4 aa hi Q1 2023 hi01
5 aa hi Q2 2023 hi01
6 bb hi Q1 2024 hi01
7 bb hi Q1 2024 hi02
8 bb hi Q1 2024 hi03
CodePudding user response:
Try with melt
s = df.melt(['id','type','date'])
s['count'] = s['type'] s['value'].astype(str)
s['date'] = s.date.astype(str) s.variable
s
id type date variable value count
0 aa hi 2022Q1 Q1 1 hi1
1 aa hi 2023Q1 Q1 1 hi1
2 aa ok 2022Q1 Q1 1 ok1
3 bb hi 2024Q1 Q1 3 hi3
4 aa hi 2022Q2 Q2 2 hi2
5 aa hi 2023Q2 Q2 1 hi1
6 aa ok 2022Q2 Q2 0 ok0
7 bb hi 2024Q2 Q2 0 hi0