I have a dataframe,
L-1 L-1-1 L-1-2 L-1-3 L-2 L-2-1 L-2-2 L-2-3
L-1 0 0 0 0 0 0 0 0
L-1-1 0 0 0 0 0 0 0 0
L-1-2 0 0 0 0 0 0 0 0
L-1-3 0 0 0 0 0 0 0 0
L-2 0 0 0 0 0 0 0 0
L-2-1 0 0 0 0 0 0 0 0
L-2-2 0 0 0 0 0 0 0 0
L-2-3 0 0 0 0 0 0 0 0
I want to fill 1's in each cell with inclusive relationships of rows and columns
The output will be
L-1 L-1-1 L-1-2 L-1-3 L-2 L-2-1 L-2-2 L-2-3
L-1 1 1 1 1 0 0 0 0
L-1-1 0 1 0 0 0 0 0 0
L-1-2 0 0 1 0 0 0 0 0
L-1-3 0 0 0 1 0 0 0 0
L-2 0 0 0 0 1 1 1 1
L-2-1 0 0 0 0 0 1 0 0
L-2-2 0 0 0 0 0 0 1 0
L-2-3 0 0 0 0 0 0 0 1
For example,
The string "L-1" includes "L-1" so the cell (1,1) will be 1.
The string "L1-1-1" includes "L-1" so the cell (1,2) will be 1.
The string "L-2-1" doesn't include "L-1" so the cell (1,6) will be 0.
and so on.
CodePudding user response:
Use numpy broadcasting for comapre original index and columns and also with columns with remove values after last _
and then pass to DataFrame.mask
:
i1 = df.index.to_numpy()
c1 = df.columns.to_numpy()
c = df.columns.str.rsplit('-', n=1).str[0].to_numpy()
df = df.mask((c == i1[:, None]) | (c1 == i1[:, None]), 1)
print (df)
L-1 L-1-1 L-1-2 L-1-3 L-2 L-2-1 L-2-2 L-2-3
L-1 1 1 1 1 0 0 0 0
L-1-1 0 1 0 0 0 0 0 0
L-1-2 0 0 1 0 0 0 0 0
L-1-3 0 0 0 1 0 0 0 0
L-2 0 0 0 0 1 1 1 1
L-2-1 0 0 0 0 0 1 0 0
L-2-2 0 0 0 0 0 0 1 0
L-2-3 0 0 0 0 0 0 0 1
Or use Series.str.startswith
for each index value for mask:
df = df.mask(np.array(list(df.index.map(lambda x: df.columns.str.startswith(x)))), 1)
print (df)
L-1 L-1-1 L-1-2 L-1-3 L-2 L-2-1 L-2-2 L-2-3
L-1 1 1 1 1 0 0 0 0
L-1-1 0 1 0 0 0 0 0 0
L-1-2 0 0 1 0 0 0 0 0
L-1-3 0 0 0 1 0 0 0 0
L-2 0 0 0 0 1 1 1 1
L-2-1 0 0 0 0 0 1 0 0
L-2-2 0 0 0 0 0 0 1 0
L-2-3 0 0 0 0 0 0 0 1