Home > Mobile >  Faster than double for loop on Counter object
Faster than double for loop on Counter object

Time:10-05

I want to do a double loop on a Counter object, which is the result of two different counters subtraction. My counter is like this:

{'sun': 5,
 'abstract': 0.0,
 'action': 10,
 'ad': 0.0,
  ....}

And I have a dataframe like:

    0           1   
0   sun         sunlight        
2   river       water   
3   stair       staircase
4   morning     sunrise 
n   ......

My purpose is to to keep in the dataframe only couples of words which the first word of the row have 0 of frequency and the second more than 0 (or the contrary, more than zero the first and 0 the second, so excluding couples both of 0 frequency or both of more than zero frequency).

I've tried to do this, but it is too slow (it will take more than 5 hours to complete):

for i,j in counter_diff.items():       #extract i word and j counter number of a item
  for t,k in counter_diff.items():     #extract t word and k counter number of a item
    for s in range(len(df)):
      if ((df[0][s] == i and j==0) and (df[1][s] == t and k==0)):
        df = df.drop([s])
      elif ((df[0][s] == i and j>0) and (df[1][s] == t and k>0)):
        df = df.drop([s])
    df = df.reset_index(drop=True)

Have you any suggestion of a better way to do it? Thank you for your time!

CodePudding user response:

One approach is to use applymap numpy.logical_xor:

from collections import Counter
import pandas as pd
import numpy as np

# toy Counter object
counts = Counter({'sun': 5, 'abstract': 0, 'action': 10, 'ad': 0})

# toy DataFrame object
df = pd.DataFrame(data=[["sun", "sunlight"],
        ["river", "water"],
        ["stair", "staircase"],
        ["morning", "sunrise"]])

# map the counts element-wise over all the elements of the DataFrame
# and create boolean mask
indicators = df.applymap(lambda x: counts.get(x, 0)) > 0

# use a logical xor to find the combinations where the count is 0 and >0 (and the other way around)
mask = np.logical_xor(indicators[0], indicators[1])

# finally filter using a mask
res = df[mask]
print(res)

Output

     0         1
0  sun  sunlight

The time complexity of this approach is O(n) where n is the size (number of cells) of the DataFrame. More info in xor (exclusive or) can be found here.

CodePudding user response:

IIUC, you can try:

d = {'sun': 5, 'abstract': 0.0, 'action': 10, 'ad': 0.0}
df = pd.DataFrame({0: ["sun", "river", "stair", "morning"], 
                   1: ["sunlight", "water", "staircase", "sunrise"]})

>>> df.loc[(df[0].map(d)>0) (df[1].map(d)>0)==1]
     0         1
0  sun  sunlight

If you have other columns in df and want to check if exactly one column has a count greater than 0:

>>> df.loc[df.apply(lambda x: x.map(d)>0).sum(axis=1)==1]
  • Related