Given the following dataframe, where A, B, C are the original input column and X, Y, Z are the target columns, I would like to repeat rows in my dataframe by adding new column for X and Y as Xi and Yi as follows.
import pandas as pd
df = pd.DataFrame(data={
'id': ['1'],
'A' : ['a1'],
'B' : ['b1'],
'C' : ['c1'],
'X' : ['x1'],
'Y' : ['y1'],
'Z' : ['z1']
})
---
id A B C X Y Z
0 1 a1 b1 c1 x1 y1 z1
For this above df, I would like to create new df with following data.
df_new = pd.DataFrame(data={
'id': ['1', '1', '1'],
'A' : ['a1', 'a1', 'a1'],
'B' : ['b1', 'b1', 'b1'],
'C' : ['c1', 'c1', 'c1'],
'Xi' : ['0', 'x1', 'x1'],
'Yi' : ['0', '0', 'y1'],
'X' : ['x1', '0', '0'],
'Y' : ['0', 'y1', '0'],
'Z' : ['0', '0', 'z1']
})
---
id A B C Xi Yi X Y Z
0 1 a1 b1 c1 0 0 x1 0 0
1 1 a1 b1 c1 x1 0 0 y1 0
2 1 a1 b1 c1 x1 y1 0 0 z1
CodePudding user response:
Use Index.repeat
with DataFrame.loc
for duplicated rows to groups with length 3 and then by Series.where
and
Series.mask
with Series.duplicated
set 0
values:
df = df.loc[df.index.repeat(3)].copy()
m1 = df.index.duplicated()
m2 = df.index.duplicated(keep='last')
df['Xi'] = df['X'].where(m1, 0)
df['Yi'] = df['Y'].mask(m2, 0)
df['X'] = df['X'].mask(m1, 0)
df['Y'] = df['Y'].where(m1 & m2, 0)
df['Z'] = df['Z'].mask(m2, 0)
cols = df.columns[:4].tolist() ['Xi','Yi'] df.columns[4:7].tolist()
df = df.reset_index(drop=True).reindex(cols, axis=1)
print (df)
id A B C Xi Yi X Y Z
0 1 a1 b1 c1 0 0 x1 0 0
1 1 a1 b1 c1 x1 0 0 y1 0
2 1 a1 b1 c1 x1 y1 0 0 z1
CodePudding user response:
You can use numpy.triu
and numpy.diag
to generate appropriate boolean masks. The rest is only simple pandas masking:
df2 = df.loc[df.index.repeat(3)].copy()
mask1 = np.tile(np.triu(np.ones((3,2), dtype=bool)), (len(df), 1))
mask2 = np.tile(~np.diag(np.ones(3, dtype=bool)), (len(df), 1))
df2[['Xi', 'Yi']] = df2[['X', 'Y']].mask(mask1, 0)
df2[['X', 'Y', 'Z']] = df2[['X', 'Y', 'Z']].mask(mask2, 0)
df2 = df2[['id', 'A', 'B', 'C', 'Xi', 'Yi', 'X', 'Y', 'Z']]
NB. the solution works for an arbitrary number of rows. If you always have a single row, you can remove the numpy.tile
parts.
output:
id A B C Xi Yi X Y Z
0 1 a1 b1 c1 0 0 x1 0 0
0 1 a1 b1 c1 x1 0 0 y1 0
0 1 a1 b1 c1 x1 y1 0 0 z1