I would like to get df_output from df_input in below code. It is basically repeating the row 2 times grouped by date column. Also repeated tag should be included.
import pandas as pd
df_input = pd.DataFrame( [
['01/01', '1', '10'],
['01/01', '2', '5'],
['01/02', '1', '9'],
['01/02', '2', '7'],
], columns=['date','type','value'])
df_output = pd.DataFrame( [
['01/01', '1', '10', '1'],
['01/01', '2', '5', '1'],
['01/01', '1', '10', '2'],
['01/01', '2', '5', '2'],
['01/02', '1', '9', '1'],
['01/02', '2', '7', '1'],
['01/02', '1', '9', '2'],
['01/02', '2', '7', '2'],
], columns=['date','type','value', 'repeat'])
print(df_output)
I thought about grouping by the date column above and repeat the rows n times, but could not find the code.
CodePudding user response:
You can use GroupBy.apply
per date, and pandas.concat
:
N = 2
out = (df_input
.groupby(['date'], group_keys=False)
.apply(lambda d: pd.concat([d]*N))
)
output:
date type value
0 01/01 1 10
1 01/01 2 5
0 01/01 1 10
1 01/01 2 5
2 01/02 1 9
3 01/02 2 7
2 01/02 1 9
3 01/02 2 7
With "repeat" column:
N = 2
out = (df_input
.groupby(['date'], group_keys=False)
.apply(lambda d: pd.concat([d.assign(repeat=n 1) for n in range(N)]))
)
output:
date type value repeat
0 01/01 1 10 1
1 01/01 2 5 1
0 01/01 1 10 2
1 01/01 2 5 2
2 01/02 1 9 1
3 01/02 2 7 1
2 01/02 1 9 2
3 01/02 2 7 2