Home > database >  Unable to run string functions on pandas Series values
Unable to run string functions on pandas Series values

Time:07-04

i am writing to filter some code from a dataframe.
students = [('jack',  34, 'Sydeny',    'Australia'),
            ('Riti',  30, 'Delhi',     'India'),
            ('Vikas', 31, 'Mumbai',    'India'),
            ('Neelu', 32, 'Bangalore', 'India'),
            ('John',  16, 'New York',   'US'),
            ('Mike',  17, 'las vegas',  'US')]

df = pd.DataFrame( students,
                   columns=['Name', 'Age', 'City', 'Country'],
                   index=['a', 'b', 'c', 'd', 'e', 'f'])

i am trying to filter records for which country starts with 'I'. When i try to run this

print(df.loc[lambda x:np.char.startswith(x['Country'],'I')])

it says

string operation on non-string array

Even tried converting the column to string with

df.astype({'Country':str})

pl update what is the mistake i am making

CodePudding user response:

Use str accessor:

>>> df[df['Country'].str.startswith('I')]
    Name  Age       City Country
b   Riti   30      Delhi   India
c  Vikas   31     Mumbai   India
d  Neelu   32  Bangalore   India

# OR df[df['Country'].str[0] == 'I']

You can read Testing for strings that match or contain a pattern to know more.

Update

To fix your code, you have to convert Country Series to list or array with string or unicode dtype (not object)

>>> df[np.char.startswith(df['Country'].to_numpy(str), 'I')]
    Name  Age       City Country
b   Riti   30      Delhi   India
c  Vikas   31     Mumbai   India
d  Neelu   32  Bangalore   India
  • Related