Home > Enterprise >  Python dataframe fill 1's in cells with inclusive relationship of row and column
Python dataframe fill 1's in cells with inclusive relationship of row and column

Time:06-27

I have a dataframe,

     L-1    L-1-1   L-1-2   L-1-3   L-2     L-2-1   L-2-2   L-2-3
L-1    0        0       0       0     0         0       0       0
L-1-1  0        0       0       0     0         0       0       0
L-1-2  0        0       0       0     0         0       0       0
L-1-3  0        0       0       0     0         0       0       0
L-2    0        0       0       0     0         0       0       0
L-2-1  0        0       0       0     0         0       0       0
L-2-2  0        0       0       0     0         0       0       0
L-2-3  0        0       0       0     0         0       0       0

I want to fill 1's in each cell with inclusive relationships of rows and columns

The output will be

     L-1    L-1-1   L-1-2   L-1-3   L-2     L-2-1   L-2-2   L-2-3
L-1    1        1       1       1     0         0       0       0
L-1-1  0        1       0       0     0         0       0       0
L-1-2  0        0       1       0     0         0       0       0
L-1-3  0        0       0       1     0         0       0       0
L-2    0        0       0       0     1         1       1       1
L-2-1  0        0       0       0     0         1       0       0
L-2-2  0        0       0       0     0         0       1       0
L-2-3  0        0       0       0     0         0       0       1

For example,

The string "L-1" includes "L-1" so the cell (1,1) will be 1.

The string "L1-1-1" includes "L-1" so the cell (1,2) will be 1.

The string "L-2-1" doesn't include "L-1" so the cell (1,6) will be 0.

and so on.

CodePudding user response:

Use numpy broadcasting for comapre original index and columns and also with columns with remove values after last _ and then pass to DataFrame.mask:

i1 = df.index.to_numpy()
c1 = df.columns.to_numpy()
c = df.columns.str.rsplit('-', n=1).str[0].to_numpy()

df = df.mask((c == i1[:, None]) | (c1 ==  i1[:, None]), 1)
print (df)
       L-1  L-1-1  L-1-2  L-1-3  L-2  L-2-1  L-2-2  L-2-3
L-1      1      1      1      1    0      0      0      0
L-1-1    0      1      0      0    0      0      0      0
L-1-2    0      0      1      0    0      0      0      0
L-1-3    0      0      0      1    0      0      0      0
L-2      0      0      0      0    1      1      1      1
L-2-1    0      0      0      0    0      1      0      0
L-2-2    0      0      0      0    0      0      1      0
L-2-3    0      0      0      0    0      0      0      1

Or use Series.str.startswith for each index value for mask:

df = df.mask(np.array(list(df.index.map(lambda x: df.columns.str.startswith(x)))), 1)

print (df)
       L-1  L-1-1  L-1-2  L-1-3  L-2  L-2-1  L-2-2  L-2-3
L-1      1      1      1      1    0      0      0      0
L-1-1    0      1      0      0    0      0      0      0
L-1-2    0      0      1      0    0      0      0      0
L-1-3    0      0      0      1    0      0      0      0
L-2      0      0      0      0    1      1      1      1
L-2-1    0      0      0      0    0      1      0      0
L-2-2    0      0      0      0    0      0      1      0
L-2-3    0      0      0      0    0      0      0      1
  • Related