Home > database >  Pandas groupby on column and repeat the values of another column
Pandas groupby on column and repeat the values of another column

Time:05-07

I was trying to divide the month into two weeks. Basically for each month i am trying to create week numbers like 1,2,3,4 and repeat them.

How to create the required column like below:

import numpy as np
import pandas as pd


df = pd.DataFrame({'Year_Month':['2020-jan']*6 ['2020-feb'] ['2021-jan']*2 ['2021-jan'] ['2021-mar']*5})
df['B'] = [1]*6 [5] [1]*2 [1] [10]*5
df['groubyA_repeatB'] = [1,2,3,4,1,2,
                         5,
                         1,2,3,
                         10,11,12,13,10] # groupby column A and repeat [i,i 1,i 2,i 3] 

# e.g. for group A: we have 8 'a' and value of a is 1, so we need to repeat [1,2,3,4] until we get 8 elements.
df

   Year_Month   B  groubyA_repeatB  # how to create this column
0    2020-jan   1                1
1    2020-jan   1                2
2    2020-jan   1                3
3    2020-jan   1                4
4    2020-jan   1                1
5    2020-jan   1                2
6    2020-feb   5                5
7    2021-jan   1                1
8    2021-jan   1                2
9    2021-jan   1                3
10   2021-mar  10               10
11   2021-mar  10               11
12   2021-mar  10               12
13   2021-mar  10               13
14   2021-mar  10               10

CodePudding user response:

Try:

#get the total count per group and generate the repeated 1-4 series
counts = df.groupby("Year_Month",sort=False)["B"].count().apply(lambda x: [1,2,3,4]*(x//4) [1,2,3,4][:x%4])

#explode to split lists into individual rows
counts = counts.explode().reset_index()

#add to "B" value
df["groupbyA_repeatB"] = counts["B"].add(df["B"]).sub(1)

>>> df
   Year_Month   B groupbyA_repeatB
0    2020-jan   1                1
1    2020-jan   1                2
2    2020-jan   1                3
3    2020-jan   1                4
4    2020-jan   1                1
5    2020-jan   1                2
6    2020-feb   5                5
7    2021-jan   1                1
8    2021-jan   1                2
9    2021-jan   1                3
10   2021-mar  10               10
11   2021-mar  10               11
12   2021-mar  10               12
13   2021-mar  10               13
14   2021-mar  10               10

CodePudding user response:

You can utilize cycle to create cycle for list and slice to get specific count

from itertools import cycle, islice

out = (df
 .groupby(['Year_Month', 'B'])
 .apply(lambda g: g.assign(groubyA_repeatB_=list(islice(cycle(range(g.iloc[0]['B'], g.iloc[0]['B'] 4)), len(g))))))
print(out)

   Year_Month   B  groubyA_repeatB  groubyA_repeatB_
0    2020-jan   1                1                 1
1    2020-jan   1                2                 2
2    2020-jan   1                3                 3
3    2020-jan   1                4                 4
4    2020-jan   1                1                 1
5    2020-jan   1                2                 2
6    2020-feb   5                5                 5
7    2021-jan   1                1                 1
8    2021-jan   1                2                 2
9    2021-jan   1                3                 3
10   2021-mar  10               10                10
11   2021-mar  10               11                11
12   2021-mar  10               12                12
13   2021-mar  10               13                13
14   2021-mar  10               10                10
  • Related