I have a dataframe consisting of bond information. Each row is a bond and shows which atoms are bonded together. I want to create an array in binary which will map out which atoms are bonded to which (1 is a bond, 0 is no bond). So if the first row is [0 0 0 1 0 1], that means atom 1 is bonded to atom4 and atom6. I am working to iterate over a dataframe of bonds and construct an array mapping out the bonds
d = {'atom1':[1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7],
'atom2':[2, 7, 3, 1, 6, 4, 2, 3, 5, 4, 6, 5, 7, 1, 6]}
df = pd.DataFrame(d)
array = np.zeros((7, 7), dtype=int)
for i, row in df.iterrows():
# Bonddata : setting the atoms for the bond (two atoms in each row)
a1 = df.loc[i, 'atom1']
a2 = df.loc[i, 'atom2']
# Set 0 to 1 for both atoms in empty arrays
array[a1, a2] = 1
array[a2, a1] = 1
I am able to get the values for a1 and a2 from the dataframe, but the trouble is using those values to index in the array.
Example: For the first df row, there exists a bond from atom 1 to atom 2. I want to map this bond for both atom 1 and atom 2 in the array.
Which should look like this
[[0 1 0 0 0 0 0]
[1 0 0 0 0 0 0]
...]
CodePudding user response:
Try to use pd.crosstab
:
cx = pd.crosstab(df['atom1'], df['atom2'])
Output:
>>> cx
atom2 1 2 3 4 5 6 7
atom1
1 0 1 0 0 0 0 1
2 1 0 1 0 0 1 0
3 0 1 0 1 0 0 0
4 0 0 1 0 1 0 0
5 0 0 0 1 0 1 0
6 0 0 0 0 1 0 1
7 1 0 0 0 0 1 0
>>> cx.to_numpy()
array([[0, 1, 0, 0, 0, 0, 1],
[1, 0, 1, 0, 0, 1, 0],
[0, 1, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 1, 0],
[0, 0, 0, 0, 1, 0, 1],
[1, 0, 0, 0, 0, 1, 0]])
CodePudding user response:
One alternative, using numpy only:
array = np.zeros((7, 7), dtype=bool)
indices = df.to_numpy().T - 1
np.logical_or.at(array, tuple(indices), 1)
array = array.astype(int)
print(array)
Output
[[0 1 0 0 0 0 1]
[1 0 1 0 0 1 0]
[0 1 0 1 0 0 0]
[0 0 1 0 1 0 0]
[0 0 0 1 0 1 0]
[0 0 0 0 1 0 1]
[1 0 0 0 0 1 0]]