Home > OS >  Dataframe groupby certain column and repeat the row n times
Dataframe groupby certain column and repeat the row n times

Time:04-06

I would like to get df_output from df_input in below code. It is basically repeating the row 2 times grouped by date column. Also repeated tag should be included.

import pandas as pd

df_input = pd.DataFrame( [
        ['01/01', '1', '10'],
        ['01/01', '2', '5'],
        ['01/02', '1', '9'],
        ['01/02', '2', '7'],
], columns=['date','type','value'])

df_output = pd.DataFrame( [
        ['01/01', '1', '10', '1'],
        ['01/01', '2', '5', '1'],
        ['01/01', '1', '10', '2'],
        ['01/01', '2', '5', '2'],

        ['01/02', '1', '9', '1'],
        ['01/02', '2', '7', '1'],
        ['01/02', '1', '9', '2'],
        ['01/02', '2', '7', '2'],
], columns=['date','type','value', 'repeat'])
print(df_output)

I thought about grouping by the date column above and repeat the rows n times, but could not find the code.

CodePudding user response:

You can use GroupBy.apply per date, and pandas.concat:

N = 2
out = (df_input
      .groupby(['date'], group_keys=False)
      .apply(lambda d: pd.concat([d]*N))
      )

output:

    date type value
0  01/01    1    10
1  01/01    2     5
0  01/01    1    10
1  01/01    2     5
2  01/02    1     9
3  01/02    2     7
2  01/02    1     9
3  01/02    2     7

With "repeat" column:

N = 2
out = (df_input
      .groupby(['date'], group_keys=False)
      .apply(lambda d: pd.concat([d.assign(repeat=n 1) for n in range(N)]))
      )

output:

    date type value  repeat
0  01/01    1    10       1
1  01/01    2     5       1
0  01/01    1    10       2
1  01/01    2     5       2
2  01/02    1     9       1
3  01/02    2     7       1
2  01/02    1     9       2
3  01/02    2     7       2
  • Related