Home > Software engineering >  Pandas - Create multiple new columns if str.contains return multiple value
Pandas - Create multiple new columns if str.contains return multiple value

Time:11-24

I have some data like this:

0       Very user friendly interface and has 2FA support
1       The trading page is great though with allot o...
2                                         Widget support
3       But it’s really only for serious traders with...
4       The KYC and AML process is painful - it took ...
                             ...                        
937                                      Legit platform!
938     Horrible customer service won’t get back to m...
939                             App is fast and reliable
940               I wish it had a portfolio chart though
941    The app isn’t as user friendly as it need to b...
Name: reviews, Length: 942, dtype: object

and features:

 ['support',
 'time',
 'follow',
 'submit',
 'ticket',
 'team',
 'swap',
 'account',
 'experi',
 'contact',
 'user',
 'platform',
 'screen',
 'servic',
 'custom',
 'restrict',
 'fast',
 'portfolio',
 'specialist']

I want to check if one or more of features in reviews add that words in new column.

and my code is this:

data["words"] = data[data["reviews"].str.contains('|'.join(features))]

but this code make new column with name "words" however because sometime code return multi value so I get error

ValueError: Columns must be same length as key

how can fix it?

CodePudding user response:

The issue is that you are not actually extracting any of the words. You need to pull the words you want out of the text and then cat them into a new column.

import pandas as pd
from io import StringIO
import re

TESTDATA = StringIO("""Index,reviews,
0,       Very user friendly interface and has 2FA support,
1,       The trading page is great though with allot o...,
2,                                         Widget support,
3,       But it’s really only for serious traders with...,
4,       The KYC and AML process is painful - it took ...,
937,                                      Legit platform!,
938,     Horrible customer service won’t get back to m...,
939,                             App is fast and reliable,
940,               I wish it had a portfolio chart though,
941,    The app isn’t as user friendly as it need to b...
    """)

data = pd.read_csv(TESTDATA, sep=",").drop('Unnamed: 2',   axis = 1)
data
#>    Index                                            reviews
0      0         Very user friendly interface and has 2F...
1      1         The trading page is great though with a...
2      2                                           Widge...
3      3         But it’s really only for serious trader...
4      4         The KYC and AML process is painful - it...
5    937                                        Legit pl...
6    938       Horrible customer service won’t get back ...
7    939                               App is fast and r...
8    940                 I wish it had a portfolio chart...
9    941      The app isn’t as user friendly as it need ...

data['words'] = list(map(lambda x: ", ".join(x), [re.findall('|'.join(features), x) for x in data.reviews]))
data
#>    Index                                            reviews           words
0      0         Very user friendly interface and has 2F...   user, support
1      1         The trading page is great though with a...                
2      2                                           Widge...         support
3      3         But it’s really only for serious trader...                
4      4         The KYC and AML process is painful - it...                
5    937                                        Legit pl...        platform
6    938       Horrible customer service won’t get back ...  custom, servic
7    939                               App is fast and r...            fast
8    940                 I wish it had a portfolio chart...       portfolio
9    941      The app isn’t as user friendly as it need ...            user
  • Related