I have a dataframe like this.
Input:
index colA colB colC
0 A 1 QQQ
1 A 1 WWW
2 A 1 EEE
3 A 1 RRR
4 B 2 TTT
5 B 2 YYY
6 B 2 UUU
7 C 1 III
I would like to repeat rows with same colA
for value at colB
times. For example, row index 4-6 has colA
of B, and the value at colB
is 2, then I want to repeat row 4-6 two times (order of 4-6 does not change), as shown in the output. If value at colB
is 3, then repeat three times, if value is 1, then don't repeat. In addition, I would like to add a column colC
indicate the index of colA
, such that colA=A has no repeat, then the num is 0, colA=B has two repeats, the num is 1 and 2, ... Is there an elegant way to do this in pandas dataframe?
Output:
index colA colB colC num
0 A 1 QQQ 0
1 A 1 WWW 0
2 A 1 EEE 0
3 A 1 RRR 0
4 B 2 TTT 1
5 B 2 YYY 1
6 B 2 UUU 1
7 B 2 TTT 2
8 B 2 YYY 2
9 B 2 UUU 2
10 C 1 III 3
CodePudding user response:
Use custom function for repeat in order (slow in many groups) with keys
parameter, so possible add new column num
by GroupBy.ngroup
:
def f(x):
pos = int(x.iat[0, x.columns.get_loc('colB')])
return pd.concat([x] * pos, keys=range(pos))
df = df.groupby('colA', group_keys=False, sort=False).apply(f).reset_index(level=0)
df['num'] = df.groupby(['level_0','colA'], sort=False).ngroup()
df = df.drop('level_0', axis=1)
print (df)
colA colB colC num
index
0 A 1 QQQ 0
1 A 1 WWW 0
2 A 1 EEE 0
3 A 1 RRR 0
4 B 2 TTT 1
5 B 2 YYY 1
6 B 2 UUU 1
4 B 2 TTT 2
5 B 2 YYY 2
6 B 2 UUU 2
7 C 1 III 3
CodePudding user response:
IIUC, you could use index.repeat
to duplicate the index and GroupBy.cumcount
to compute your "num" column:
(df
.loc[df.index.repeat(df['colB'])]
.assign(num=lambda d: d.groupby(['colA', 'colC']).cumcount().cumsum())
)
output:
index colA colB colC num
0 0 A 1 QQQ 0
1 1 A 1 WWW 0
2 2 A 1 EEE 0
3 3 A 1 RRR 0
4 4 B 2 TTT 0
4 4 B 2 TTT 1
5 5 B 2 YYY 1
5 5 B 2 YYY 2
6 6 B 2 UUU 2
6 6 B 2 UUU 3
7 7 C 1 III 3