How to make a code for all rows in a pandas Dataframe using a code for one specific row?-CodePudding

I have a code for one row only row0 where I am applying a systematic procedure as follows:

import pandas as pd

z = [[1,1,0,0,1],
     [0,1,0,0,1],
     [0,1,1,1,1],
     [0,0,0,1,0],
     [0,0,1,0,1]]
z = pd.DataFrame(z)

df = z.copy()
df.index = ['F1','F2','F3','F4','F5']
df.columns = ['F1','F2','F3','F4','F5']
df
   F1  F2  F3  F4  F5
F1  1   1   0   0   1
F2  0   1   0   0   1
F3  0   1   1   1   1
F4  0   0   0   1   0
F5  0   0   1   0   1

a) Let's consider row 'F1'. Check the position of 0s in this row. For example, there are 0s at column 'F3','F4'.

row0 = z.loc[[0],:]
col_zeros = row0.loc[:, (row0 == 0).all(axis=0)]

b) Then I converted this column positions into a list

col_list = col_zeros.columns.to_list()

c) Using col_list to hide rows of same position labels 'F3','F4' from original dataframe z

df2 = z.copy()
df2.drop(df2.index[col_list], axis = 0, inplace = True)

d) Now checking the remaining columns one-by-one if there is 1 present in any of these columns

df3 = df2[col_list]

e) After checking, I can see 1 present in column 2 and not present in column 3. Hence Replacing 0 with 1 in row0 at column position 2 i.e., 'F3'

df.iloc[0,2] = 1

f) The entire code mentioned above from a) to e) is just for row 'F1'. I want the same procedure to be repeated for rest of the rows 'F2','F3','F4','F5'.

Output I need after executing above method for all df rows :

     F1  F2  F3  F4  F5
  F1  1   1   1   0   1
  F2  0   1   1   0   1
  F3  0   1   1   1   1
  F4  0   0   0   1   0
  F5  0   1   1   1   1

CodePudding user response：

Your question already offers a solution, so I assume you want something faster then looping through your matrix row-by-row. In short my approach would be

m = df.to_numpy()
print(m | (m * m[:, np.newaxis, :].transpose((0,2,1))).any(axis=1))

which will give you your desired result.

Your problem can be analyzed differently, your step (c) is equivalent to masking your matrix by the transpose of your first row. By masking, it can mean a multiplication operation, so that as a result, the masked matrix becomes

   F1  F2  F3  F4  F5
F1  1   1   0   0   1
F2  0   1   0   0   1
F3  0   0   0   0   0
F4  0   0   0   0   0
F5  0   0   1   0   1

Note that instead of removing F3 and F4, they become zeros if you multiply each column with the tranpose of your first row. Then you only need to apply any along the zeroth axis to get this

   F1  F2  F3  F4  F5
    1   1   1   0   1

and do a element-wise or of this outcome to the first row of your matrix to get the desired version of first row

   F1  F2  F3  F4  F5
F1  1   1   1   0   1
F2  0   1   0   0   1
F3  0   0   0   0   0
F4  0   0   0   0   0
F5  0   0   1   0   1

` Note that only the first row is done.

Obviously this is an algebraic problem and my preference to this matrix algebra is numpy so I will first

m = df.to_numpy()

Then this is the masking step

m * m[:, np.newaxis, :].transpose((0,2,1))

and take the any as described above

(m * m[:, np.newaxis, :].transpose((0,2,1))).any(axis=1)

and finally the element-wise or

m | (m * m[:, np.newaxis, :].transpose((0,2,1))).any(axis=1)