I want to do a double loop on a Counter object, which is the result of two different counters subtraction. My counter is like this:
{'sun': 5,
'abstract': 0.0,
'action': 10,
'ad': 0.0,
....}
And I have a dataframe like:
0 1
0 sun sunlight
2 river water
3 stair staircase
4 morning sunrise
n ......
My purpose is to to keep in the dataframe only couples of words which the first word of the row have 0 of frequency and the second more than 0 (or the contrary, more than zero the first and 0 the second, so excluding couples both of 0 frequency or both of more than zero frequency).
I've tried to do this, but it is too slow (it will take more than 5 hours to complete):
for i,j in counter_diff.items(): #extract i word and j counter number of a item
for t,k in counter_diff.items(): #extract t word and k counter number of a item
for s in range(len(df)):
if ((df[0][s] == i and j==0) and (df[1][s] == t and k==0)):
df = df.drop([s])
elif ((df[0][s] == i and j>0) and (df[1][s] == t and k>0)):
df = df.drop([s])
df = df.reset_index(drop=True)
Have you any suggestion of a better way to do it? Thank you for your time!
CodePudding user response:
One approach is to use applymap
numpy.logical_xor
:
from collections import Counter
import pandas as pd
import numpy as np
# toy Counter object
counts = Counter({'sun': 5, 'abstract': 0, 'action': 10, 'ad': 0})
# toy DataFrame object
df = pd.DataFrame(data=[["sun", "sunlight"],
["river", "water"],
["stair", "staircase"],
["morning", "sunrise"]])
# map the counts element-wise over all the elements of the DataFrame
# and create boolean mask
indicators = df.applymap(lambda x: counts.get(x, 0)) > 0
# use a logical xor to find the combinations where the count is 0 and >0 (and the other way around)
mask = np.logical_xor(indicators[0], indicators[1])
# finally filter using a mask
res = df[mask]
print(res)
Output
0 1
0 sun sunlight
The time complexity of this approach is O(n)
where n is the size (number of cells) of the DataFrame. More info in xor (exclusive or) can be found here.
CodePudding user response:
IIUC, you can try:
d = {'sun': 5, 'abstract': 0.0, 'action': 10, 'ad': 0.0}
df = pd.DataFrame({0: ["sun", "river", "stair", "morning"],
1: ["sunlight", "water", "staircase", "sunrise"]})
>>> df.loc[(df[0].map(d)>0) (df[1].map(d)>0)==1]
0 1
0 sun sunlight
If you have other columns in df
and want to check if exactly one column has a count greater than 0:
>>> df.loc[df.apply(lambda x: x.map(d)>0).sum(axis=1)==1]