Home > OS >  Repeat rows in dataframe for n target columns by adding n-1 target columns
Repeat rows in dataframe for n target columns by adding n-1 target columns

Time:09-27

Given the following dataframe, where A, B, C are the original input column and X, Y, Z are the target columns, I would like to repeat rows in my dataframe by adding new column for X and Y as Xi and Yi as follows.

import pandas as pd

df = pd.DataFrame(data={
    'id': ['1'],
    'A' : ['a1'],
    'B' : ['b1'],
    'C' : ['c1'],
    'X' : ['x1'],
    'Y' : ['y1'],
    'Z' : ['z1']
})
---
   id   A   B   C   X   Y   Z
0   1   a1  b1  c1  x1  y1  z1

For this above df, I would like to create new df with following data.

df_new = pd.DataFrame(data={
    'id': ['1', '1', '1'],
    'A' : ['a1', 'a1', 'a1'],
    'B' : ['b1', 'b1', 'b1'],
    'C' : ['c1', 'c1', 'c1'],
    'Xi' : ['0', 'x1', 'x1'],
    'Yi' : ['0', '0', 'y1'],
    'X' : ['x1', '0', '0'],
    'Y' : ['0', 'y1', '0'],
    'Z' : ['0', '0', 'z1']
})
---
   id   A   B   C   Xi  Yi  X   Y   Z
0   1   a1  b1  c1  0   0   x1  0   0
1   1   a1  b1  c1  x1  0   0   y1  0
2   1   a1  b1  c1  x1  y1  0   0   z1

CodePudding user response:

Use Index.repeat with DataFrame.loc for duplicated rows to groups with length 3 and then by Series.where and Series.mask with Series.duplicated set 0 values:

df = df.loc[df.index.repeat(3)].copy()

m1 = df.index.duplicated()
m2 = df.index.duplicated(keep='last')

df['Xi'] = df['X'].where(m1, 0)
df['Yi'] = df['Y'].mask(m2, 0)

df['X'] = df['X'].mask(m1, 0)
df['Y'] = df['Y'].where(m1 & m2, 0)
df['Z'] = df['Z'].mask(m2, 0)


cols = df.columns[:4].tolist()   ['Xi','Yi']   df.columns[4:7].tolist() 
df = df.reset_index(drop=True).reindex(cols, axis=1)
print (df)
  id   A   B   C  Xi  Yi   X   Y   Z
0  1  a1  b1  c1   0   0  x1   0   0
1  1  a1  b1  c1  x1   0   0  y1   0
2  1  a1  b1  c1  x1  y1   0   0  z1

CodePudding user response:

You can use numpy.triu and numpy.diag to generate appropriate boolean masks. The rest is only simple pandas masking:

df2 = df.loc[df.index.repeat(3)].copy()

mask1 = np.tile(np.triu(np.ones((3,2), dtype=bool)), (len(df), 1))
mask2 = np.tile(~np.diag(np.ones(3, dtype=bool)), (len(df), 1))

df2[['Xi', 'Yi']] = df2[['X', 'Y']].mask(mask1, 0)
df2[['X', 'Y', 'Z']] = df2[['X', 'Y', 'Z']].mask(mask2, 0)

df2 = df2[['id', 'A', 'B', 'C', 'Xi', 'Yi', 'X', 'Y', 'Z']]

NB. the solution works for an arbitrary number of rows. If you always have a single row, you can remove the numpy.tile parts.

output:

  id   A   B   C  Xi  Yi   X   Y   Z
0  1  a1  b1  c1   0   0  x1   0   0
0  1  a1  b1  c1  x1   0   0  y1   0
0  1  a1  b1  c1  x1  y1   0   0  z1
  • Related