Home > Blockchain >  Tricky concatenation of header names with column values and pivot in Pandas
Tricky concatenation of header names with column values and pivot in Pandas

Time:11-18

I have a dataset, df, where I would like to combine the column name with each column value and display the label count.

For example, for id 'aa' in 2022 Q1, there is 1 'hi'  
             for id 'aa' in 2022 Q2, there are 2 'hi' 's

Data

id  type    date    Q1  Q2
aa  hi      2022    1   2
aa  hi      2023    1   1
aa  ok      2022    1   0
bb  hi      2024    3   0

Desired

id  type    date        count
aa  hi      Q1 2022     hi01
aa  ok      Q1 2022     ok01
aa  hi      Q2 2022     hi01
aa  hi      Q2 2022     hi02
aa  hi      Q1 2023     hi01
aa  hi      Q2 2023     hi01
bb  hi      Q1 2024     hi01
bb  hi      Q1 2024     hi02
bb  hi      Q1 2024     hi03

            

Doing

My approach is to break this apart in steps. I believe I have to perform a pivot, join and a cumcount:

#create a pivot

df.set_index(['id', 'type']).stack().reset_index()

#set the count

 df['count'] = df['type']   df.groupby([*df]).cumcount().add(1).astype(str).str.zfill(2)

Any suggestion is appreciated

CodePudding user response:

You could use the following:

(df.melt(id_vars=['id', 'type', 'date'], value_name='count') # reshape data
   .sort_values(by=['date', 'variable'])
   # duplicate the rows according to counts
   .loc[lambda d: d.index.repeat(d['count'])]
   .reset_index(drop=True)
           # merge the quarters and years
   .assign(date=lambda d: d['variable'] ' ' d['date'].astype(str),
           # increment the type per group
           count=lambda d: d['type'] d.groupby(['id', 'date', 'type']).cumcount().add(1).astype(str).str.zfill(2)
          )
   # drop now unused column
   .drop(columns='variable')
)

output:

   id type     date count
0  aa   hi  Q1 2022  hi01
1  aa   ok  Q1 2022  ok01
2  aa   hi  Q2 2022  hi01
3  aa   hi  Q2 2022  hi02
4  aa   hi  Q1 2023  hi01
5  aa   hi  Q2 2023  hi01
6  bb   hi  Q1 2024  hi01
7  bb   hi  Q1 2024  hi02
8  bb   hi  Q1 2024  hi03

CodePudding user response:

Try with melt

s = df.melt(['id','type','date'])
s['count'] = s['type']   s['value'].astype(str)
s['date'] = s.date.astype(str) s.variable
s
   id type    date variable  value count
0  aa   hi  2022Q1       Q1      1   hi1
1  aa   hi  2023Q1       Q1      1   hi1
2  aa   ok  2022Q1       Q1      1   ok1
3  bb   hi  2024Q1       Q1      3   hi3
4  aa   hi  2022Q2       Q2      2   hi2
5  aa   hi  2023Q2       Q2      1   hi1
6  aa   ok  2022Q2       Q2      0   ok0
7  bb   hi  2024Q2       Q2      0   hi0
  • Related