Home > Mobile >  Passing list in pandas dataframe to sklearn for TF IDF
Passing list in pandas dataframe to sklearn for TF IDF

Time:11-05

My dataframe look like this
a = pd.DataFrame({'x': {0: 'John', 1: 'Ron', 2: 'Don'}, 
                  'y': {0: [['Apple','Apple','Apple'],['Ball','Ball'],['Cat']], 1: [['Zebra','Zebra'],['Fox','Fox']], 2: [['Elf'],['Ball','Ball']]}})

Where 'x' refers to documents and 'y' refers to terms (repeated for how many times they occur)

I want to pass it to :

v = TfidfVectorizer()
z = v.fit_transform(a)

In my read data, this just give me

z.toarray()
>array([[1.]])

Which makes no sense?

CodePudding user response:

IIUC use list comprehension for flatten nested lists:

v = TfidfVectorizer()
z = [v.fit_transform([z for y in x for z in y]).toarray() for x in a['y']]

print (z)
[array([[1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 1.]]), array([[0., 1.],
       [0., 1.],
       [1., 0.],
       [1., 0.]]), array([[0., 1.],
       [1., 0.],
       [1., 0.]])]
  • Related