Home > Software engineering >  Use of panda .loc function in order to select a specific data within a column
Use of panda .loc function in order to select a specific data within a column

Time:05-24

I have a dataset which can be found on this website: http://tennis-data.co.uk/alldata.php. It gathers outcomes of both WTA and ATP tennis games over several years.

I would like to find how many sets did the player “ Federer R.” win during the years 2016 and 2017, and for this I used the .loc function as shown below:

df = df_atp.loc[df_atp["Date"].between("01/01/2016", "31/12/2017"), ['Winner', 'Wsets']]
print(df)

When I print df, here is the result: (part of it since the whole result was very long) result

I think I'm on the right path, but I want to specifically have only Federer on my results, but like the image shows it I have every other players. I have tried to add ["Federer R."] at the end of the .loc function but it only gives me an error.

What could I add to the .loc function in order to have only Federer in the results?

Thank you in advance! :D

CodePudding user response:

df = df_atp.loc[df_atp["Date"].between("01/01/2016", "31/12/2017"), ['Winner', 'Wsets']]
df = df[df['Winner'] == 'Federer R.']
print(df)

is the most readable way to do it. You could also do

df = df_atp.loc[df_atp["Date"].between("01/01/2016", "31/12/2017") & (df['Winner'] == 'Federer R.'), ['Winner', 'Wsets']]

to do it in one line, but I'd favor the first approach for legibility.

CodePudding user response:

When "querying" a dataframe, consider the use of query as it can offer a performance improvement compared with other options.

  • Related