Home > Enterprise >  I getting an empty lists after intersect in python
I getting an empty lists after intersect in python

Time:09-28

I have 2 columns and in each column I have 5 words in each row.

For example:
x=[dog|cat|mouse|new|world]
y=[fish|cat|new|thing|nice]

And I need to find intersections between them [cat|new].

But it shows me an empty list. Do you know why?

data = pd.read_csv('data.csv')

intersect1=[]
    
for j in range(len(data)):
    #print('==========================================================================')
        x=str(data.iloc[:, 2]).split("|")
        y=str(data.iloc[:, 3]).split("|")  


        #get_jaccard_sim(x, y) 
    
        #intersect.append(result)


        intersect= list(set(x) & set(y))   
        intersect1.append(intersect)
    
#print(inter)
print(intersect1)

CodePudding user response:

The issue is in your iteration loop, you are selecting the whole column when you do data.iloc[:,2] when you want to only select each value row by row. Change the : to use the counter in your loop, j.

df = pd.DataFrame({'x': ['[dog|cat|mouse|new|world]'],
                   'y': ['[fish|cat|new|thing|nice]']})
  
for j in range(len(df)):
      x=str(df.iloc[j, 2]).split("|")
      y=str(df.iloc[j, 3]).split("|")
      intersect= list(set(x) & set(y))   

print(intersect)

Output:

['new', 'cat']

CodePudding user response:

I just did a test using the code below:

data1 = "dog|cat|mouse|new|world"
data2 = "fish|cat|new|thing|nice"

x = data1.split("|")
y = data2.split("|")

intersect= list(set(x) & set(y))

print(intersect)

This outputs ['cat', 'new'], exactly what you'd expect. Note that x and y are arrays containing the words as separate strings, i.e.:

['dog', 'cat', 'mouse', 'new', 'world'] # this is x
['fish', 'cat', 'new', 'thing', 'nice'] # this is y

Make sure that this is also the case in your code!

CodePudding user response:

Even though you added the code in a loop, you are not actually traversing your dataframe. Assuming your data is of this shape:

    one two
0   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
1   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
2   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
3   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
4   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
5   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
6   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
7   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
8   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
...

Then assuming the columns you're interested in are 2 and 3, modifying your like this would work:

for j in range(len(data)):
    x = data.iloc[j, 2][0].split('|')
    y = data.iloc[j, 3][0].split('|')
    intersect = list(set(x) & set(y))
  • Related