Home > Blockchain >  replacing specefic value in nump array based off of condition
replacing specefic value in nump array based off of condition

Time:07-30

I have a dataframe consisting of bond information. Each row is a bond and shows which atoms are bonded together. I want to create an array in binary which will map out which atoms are bonded to which (1 is a bond, 0 is no bond). So if the first row is [0 0 0 1 0 1], that means atom 1 is bonded to atom4 and atom6. I am working to iterate over a dataframe of bonds and construct an array mapping out the bonds

d = {'atom1':[1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7],
     'atom2':[2, 7, 3, 1, 6, 4, 2, 3, 5, 4, 6, 5, 7, 1, 6]}
df = pd.DataFrame(d)

array = np.zeros((7, 7), dtype=int)

for i, row in df.iterrows():
        
        # Bonddata : setting the atoms for the bond (two atoms in each row)
        a1 = df.loc[i, 'atom1']
        a2 = df.loc[i, 'atom2']
                   
        # Set 0 to 1 for both atoms in empty arrays
        array[a1, a2] = 1
        array[a2, a1] = 1

I am able to get the values for a1 and a2 from the dataframe, but the trouble is using those values to index in the array.

Example: For the first df row, there exists a bond from atom 1 to atom 2. I want to map this bond for both atom 1 and atom 2 in the array.

Which should look like this

[[0 1 0 0 0 0 0]
 [1 0 0 0 0 0 0]
...]

CodePudding user response:

Try to use pd.crosstab:

cx = pd.crosstab(df['atom1'], df['atom2'])

Output:

>>> cx
atom2  1  2  3  4  5  6  7
atom1                     
1      0  1  0  0  0  0  1
2      1  0  1  0  0  1  0
3      0  1  0  1  0  0  0
4      0  0  1  0  1  0  0
5      0  0  0  1  0  1  0
6      0  0  0  0  1  0  1
7      1  0  0  0  0  1  0

>>> cx.to_numpy()
array([[0, 1, 0, 0, 0, 0, 1],
       [1, 0, 1, 0, 0, 1, 0],
       [0, 1, 0, 1, 0, 0, 0],
       [0, 0, 1, 0, 1, 0, 0],
       [0, 0, 0, 1, 0, 1, 0],
       [0, 0, 0, 0, 1, 0, 1],
       [1, 0, 0, 0, 0, 1, 0]])

CodePudding user response:

One alternative, using numpy only:

array = np.zeros((7, 7), dtype=bool)
indices = df.to_numpy().T - 1
np.logical_or.at(array, tuple(indices), 1)
array = array.astype(int)
print(array)

Output

[[0 1 0 0 0 0 1]
 [1 0 1 0 0 1 0]
 [0 1 0 1 0 0 0]
 [0 0 1 0 1 0 0]
 [0 0 0 1 0 1 0]
 [0 0 0 0 1 0 1]
 [1 0 0 0 0 1 0]]
  • Related