I want to speed up the following code in python. I have a dictionary where each value is a list. I then use the key and each value in the list to filter a dataframe and count the shape of the dataframe. Is there a way to speed this up?
Code:
counts = {}
for key in check:
for pos in check[key]:
count = data[(data[key] != '0/0') & (data[pos] != '0/0')].shape[0]
counts[(key, pos)] = count
check is a dictionary e.g.:
check = {1:[2,3],
2:[3],
3:[]}
data is the following:
| 1 | 2 | 3 |
| - | - | - |
| 0/0 | 0/1 | 0/1 |
| 0/1 | 0/1 | 0/1 |
| 0/1 | 0/0 | 0/1 |
| 0/1 | 0/0 | 0/0 |
In this instance the results would be:
counts = {(1,2):1,
(1,3):2,
(2,3):2}
Note in this instance the checks is just all combinations of the 3 columns but in the real example that is not always the case. However, if there is a very fast way to do all combinations of columns then I can do that and just filter the ones of interest later.
Thanks!
CodePudding user response:
Here is the quick and efficient way by calculating the inner product to create a counts matrix
m = df.ne('0/0').astype('int')
out = m.T @ m
print(out)
1 2 3
1 3 1 2
2 1 2 2
3 2 2 3
print(out.loc[1, 2])
1