I have a dataset which can be found on this website: http://tennis-data.co.uk/alldata.php. It gathers outcomes of both WTA and ATP tennis games over several years.
I would like to find how many sets did the player “ Federer R.” win during the years 2016 and 2017, and for this I used the .loc function as shown below:
df = df_atp.loc[df_atp["Date"].between("01/01/2016", "31/12/2017"), ['Winner', 'Wsets']]
print(df)
When I print df, here is the result: (part of it since the whole result was very long) result
I think I'm on the right path, but I want to specifically have only Federer on my results, but like the image shows it I have every other players. I have tried to add ["Federer R."]
at the end of the .loc function but it only gives me an error.
What could I add to the .loc function in order to have only Federer in the results?
Thank you in advance! :D
CodePudding user response:
df = df_atp.loc[df_atp["Date"].between("01/01/2016", "31/12/2017"), ['Winner', 'Wsets']]
df = df[df['Winner'] == 'Federer R.']
print(df)
is the most readable way to do it. You could also do
df = df_atp.loc[df_atp["Date"].between("01/01/2016", "31/12/2017") & (df['Winner'] == 'Federer R.'), ['Winner', 'Wsets']]
to do it in one line, but I'd favor the first approach for legibility.
CodePudding user response:
When "querying" a dataframe, consider the use of query
as it can offer a performance improvement compared with other options.