I was practicing data wrangling and I eneded up with this simple dataset. but then I started to filter and selecting some information on it but is not working
here is the data set:
https://drive.google.com/file/d/1d1FMWhh3U1KnfVFYyC5R5USuB2BbcN6S/view?usp=sharing
df.head()
0 TCS
1 Accenture
2 Cognizant
3 ICICI Bank
4 HDFC Bank
...
8996 Bitla Software
8997 Kern Liebers
8998 ANAAMALAIS TOYOTA
8999 Elsevier
9000 Samsung Heavy Industries
Name: campany_name, Length: 9001, dtype: object
We see here that Accenture is in the second row but when I try to call it is not working
df['campany_name'] == 'Accenture'
0 False
1 False
2 False
3 False
4 False
...
8996 False
8997 False
8998 False
8999 False
9000 False
I don't really want to get a different way. I just want to understand what is happening under the hood and fully understand what is different in this data set that I can't just do it like I normaly do. which is df['campany_name] == 'Accenture' I should get boolenans, and with those id be able to get the row doing df[df['campany_name] == 'Accenture']
something must be wrong at the index or format level. but I mean i'm new to python.
CodePudding user response:
Do
df['campany_name'] = df['campany_name'].astype(str)
and then you can try:
df.query('campany_name == Accenture')
or
df[df['campany_name'] == 'Accenture']
and if you know the row and column and you are trying to retrieve just one value you can do:
df.at[1, 'campany_name']
Also, remember that you are just printing information, if you need to save the result, assign it to something e.g:
acc_row = df.query('campany_name == Accenture')
CodePudding user response:
As you are trying to filter the dataframe given only a string, you can use df.Series.str.contains
aaa[aaa['campany_name'].str.contains('Accenture')]
campany_name ... jobs interviews
1 Accenture ... 4600.0 2500.0
5814 Accenture Federal Services ... NaN 20.0
[2 rows x 10 columns]