I have 2 columns and in each column I have 5 words in each row.
For example:
x=[dog|cat|mouse|new|world]
y=[fish|cat|new|thing|nice]
And I need to find intersections between them [cat|new].
But it shows me an empty list. Do you know why?
data = pd.read_csv('data.csv')
intersect1=[]
for j in range(len(data)):
#print('==========================================================================')
x=str(data.iloc[:, 2]).split("|")
y=str(data.iloc[:, 3]).split("|")
#get_jaccard_sim(x, y)
#intersect.append(result)
intersect= list(set(x) & set(y))
intersect1.append(intersect)
#print(inter)
print(intersect1)
CodePudding user response:
The issue is in your iteration loop, you are selecting the whole column when you do data.iloc[:,2]
when you want to only select each value row by row. Change the :
to use the counter in your loop, j
.
df = pd.DataFrame({'x': ['[dog|cat|mouse|new|world]'],
'y': ['[fish|cat|new|thing|nice]']})
for j in range(len(df)):
x=str(df.iloc[j, 2]).split("|")
y=str(df.iloc[j, 3]).split("|")
intersect= list(set(x) & set(y))
print(intersect)
Output:
['new', 'cat']
CodePudding user response:
I just did a test using the code below:
data1 = "dog|cat|mouse|new|world"
data2 = "fish|cat|new|thing|nice"
x = data1.split("|")
y = data2.split("|")
intersect= list(set(x) & set(y))
print(intersect)
This outputs ['cat', 'new']
, exactly what you'd expect. Note that x
and y
are arrays containing the words as separate strings, i.e.:
['dog', 'cat', 'mouse', 'new', 'world'] # this is x
['fish', 'cat', 'new', 'thing', 'nice'] # this is y
Make sure that this is also the case in your code!
CodePudding user response:
Even though you added the code in a loop, you are not actually traversing your dataframe. Assuming your data is of this shape:
one two
0 [dog|cat|mouse|new|world] [fish|cat|new|thing|nice]
1 [dog|cat|mouse|new|world] [fish|cat|new|thing|nice]
2 [dog|cat|mouse|new|world] [fish|cat|new|thing|nice]
3 [dog|cat|mouse|new|world] [fish|cat|new|thing|nice]
4 [dog|cat|mouse|new|world] [fish|cat|new|thing|nice]
5 [dog|cat|mouse|new|world] [fish|cat|new|thing|nice]
6 [dog|cat|mouse|new|world] [fish|cat|new|thing|nice]
7 [dog|cat|mouse|new|world] [fish|cat|new|thing|nice]
8 [dog|cat|mouse|new|world] [fish|cat|new|thing|nice]
...
Then assuming the columns you're interested in are 2 and 3, modifying your like this would work:
for j in range(len(data)):
x = data.iloc[j, 2][0].split('|')
y = data.iloc[j, 3][0].split('|')
intersect = list(set(x) & set(y))