Home > Software design >  How to speed up a double for loop dictionary and pandas dataframe
How to speed up a double for loop dictionary and pandas dataframe

Time:10-18

I want to speed up the following code in python. I have a dictionary where each value is a list. I then use the key and each value in the list to filter a dataframe and count the shape of the dataframe. Is there a way to speed this up?

Code:

counts = {}
for key in check:
    for pos in check[key]:
        count = data[(data[key] != '0/0') & (data[pos] != '0/0')].shape[0]
        counts[(key, pos)] = count

check is a dictionary e.g.:

check = {1:[2,3],
         2:[3],
         3:[]}

data is the following:

| 1 | 2 | 3 |
| - | - | - | 
| 0/0 | 0/1 | 0/1 | 
| 0/1 | 0/1 | 0/1 |
| 0/1 | 0/0 | 0/1 | 
| 0/1 | 0/0 | 0/0 |

In this instance the results would be:

counts = {(1,2):1,
         (1,3):2, 
         (2,3):2}

Note in this instance the checks is just all combinations of the 3 columns but in the real example that is not always the case. However, if there is a very fast way to do all combinations of columns then I can do that and just filter the ones of interest later.

Thanks!

CodePudding user response:

Here is the quick and efficient way by calculating the inner product to create a counts matrix

m = df.ne('0/0').astype('int')
out = m.T @ m

print(out)

   1  2  3
1  3  1  2
2  1  2  2
3  2  2  3

print(out.loc[1, 2])

1
  • Related