i am writing to filter some code from a dataframe.
students = [('jack', 34, 'Sydeny', 'Australia'),
('Riti', 30, 'Delhi', 'India'),
('Vikas', 31, 'Mumbai', 'India'),
('Neelu', 32, 'Bangalore', 'India'),
('John', 16, 'New York', 'US'),
('Mike', 17, 'las vegas', 'US')]
df = pd.DataFrame( students,
columns=['Name', 'Age', 'City', 'Country'],
index=['a', 'b', 'c', 'd', 'e', 'f'])
i am trying to filter records for which country starts with 'I'. When i try to run this
print(df.loc[lambda x:np.char.startswith(x['Country'],'I')])
it says
string operation on non-string array
Even tried converting the column to string with
df.astype({'Country':str})
pl update what is the mistake i am making
CodePudding user response:
Use str
accessor:
>>> df[df['Country'].str.startswith('I')]
Name Age City Country
b Riti 30 Delhi India
c Vikas 31 Mumbai India
d Neelu 32 Bangalore India
# OR df[df['Country'].str[0] == 'I']
You can read Testing for strings that match or contain a pattern to know more.
Update
To fix your code, you have to convert Country
Series to list
or array
with string
or unicode
dtype (not object
)
>>> df[np.char.startswith(df['Country'].to_numpy(str), 'I')]
Name Age City Country
b Riti 30 Delhi India
c Vikas 31 Mumbai India
d Neelu 32 Bangalore India