I have these two array
data1 = [["ab","bc","ca"], ["bc","cd","da"], ["be","cd","db"]]
topics1 = [["ab","db"],["be","cd"]]
I have to find the intersection of each topic for each document. Here is my attempt.
mat11 = []
for i in range(len(data1)):
for j in range(len(topics1)):
mat1 = len(list(set(data1[i]) & set(topics1[j])))
mat11.append(mat1)
mat 11 is a list of (len(data1) * len(topic1)) elements.
mat11
I want it to be as a matrix of shape [len(data1) * len(topic1)]. So I have done the following.
import numpy as np
img_mat = np.array( mat11 )
shape = ( len(data1), len(topics1) )
img_mat.reshape( shape )
which is giving me this output
But it's not the shape which I wanted,
How to make this a 3*2 matrix. Moreover my main aim is to get a dataframe which looks like
CodePudding user response:
import numpy as np
img_mat = np.array( mat11 )
shape = ( len(data1), len(topics1) )
l = np.matrix(img_mat.reshape(shape))
import pandas as pd
l_df = pd.DataFrame(l)
l_df = l_df.rename_axis('Docs').reset_index()
l_df.Docs = pd.Series(["D" str(ind) for ind in l_df.Docs])
suffix = 'Topic'
l_df = l_df.add_prefix(suffix)
l_df.rename(columns={'TopicDocs':'Docs'}, inplace=True)