I am currently trying to improve my python as I have a very good grip on actual data analysis but I am trying to start creating functions which other people can run to return results and the code will also give out informative messages to the user. Below is a simple dataset I am using print the top 3 "Weather" descriptions for each city, but as you can see for Los Angeles it only has the one description.
City Weather
0 New York Sunny
1 New York Rain
2 New York cloudy
3 New York Rain
4 New York Sunny
5 New York Sunny
6 New York partly cloudy
7 New York thunderstorm
8 New York Rain
9 New York cloudy
10 New York sunny
11 New York partly cloudy
12 New York partly cloudy
13 New York cloudy
14 New York sunny
15 New York sunny
16 New York rain
17 Austin rain
18 Austin rain
19 Austin cloudy
20 Austin sunny
21 Austin rain
22 Austin partly cloudy
23 Austin partly cloudy
24 Austin partly cloudy
25 Austin Sunny
26 Austin cloudy
27 Austin Sunny
28 Austin Sunny
29 Austin cloudy
30 Austin cloudy
31 Austin partly cloudy
32 Austin partly cloudy
33 Austin Sunny
34 Austin rain
35 Los Angeles Sunny
36 Los Angeles Sunny
37 Los Angeles Sunny
38 Los Angeles Sunny
39 Los Angeles Sunny
40 Los Angeles Sunny
41 Los Angeles Sunny
42 Los Angeles Sunny
43 Los Angeles Sunny
44 Los Angeles Sunny
45 Los Angeles Sunny
46 Los Angeles Sunny
47 Los Angeles Sunny
48 Los Angeles Sunny
49 Los Angeles Sunny
50 Los Angeles Sunny
51 Los Angeles Sunny
52 Los Angeles Sunny
I have created a function to output the values for each city, in my own line of work this would be fine as I could do a few checks on the data but for others they would need to be informed that for Los Angeles, top 3 could not be given as there is only one weather description. I have tried using IF statements with value counts but I keep getting error messages like ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). and do not think my method is correct, it is very difficult to find examples for these kind of problems. Any guidance or even links that could help would be appreciated!
def weather_valuecount(df):
weather_valcount= df.groupby(['City']).Weather.value_counts().groupby(level=0, group_keys=False).head(3)
return weather_valcount
When I run the above I get the following results:
City Weather
Austin partly cloudy 5
Sunny 4
cloudy 4
Los Angeles Sunny 18
New York Rain 3
Sunny 3
cloudy 3
Name: Weather, dtype: int64
Which shows the top 3 description counts for each city, but Los Angeles only shows one, which I'd like to include a user message in the function to say something along the lines of "Cannot show top three unique Weather descriptions and count for Los Angeles as there is not 3 unique values available".
CodePudding user response:
Check out this post who explains why you're getting Truth value of a Series is ambiguous
And regarding your question, I'm not sure I understand the expected output.
See the code/results below (with the consideration that df
is the dataframe that holds your dataset) :
listOfCites = set(df['City'])
def show_top3_weather(df):
df1 = df.groupby('City').head(3).reset_index(drop=True).assign()
df2 = df1.drop_duplicates().groupby('City', as_index=False).count().rename(columns={'Weather':'WeatherOccu'})
df3 = df1.merge(df2, on='City', how='left').drop_duplicates()
city_name = input("Choose a city: ")
if city_name in listOfCites:
if (df3.loc[df3.City == city_name]['WeatherOccu'] == 3).any():
print(f"Below, the top three weathers of {city_name}:")
print(df3[df3.City == city_name][['City', 'Weather']])
else:
print(f"{city_name} has not three different weathers!")
else:
print(f"{city_name} doesn't exist!")
>>> show_top3_weather(df)
with New York as input
Choose a city: New York
Below, the top three weathers of New York:
City Weather
0 New York Sunny
1 New York Rain
2 New York cloudy
with Austin as input
Choose a city: Austin
Austin has not three different weathers!
with Los Angeles as input
Choose a city: Los Angeles
Los Angeles has not three different weathers!
CodePudding user response:
We can print percentage to inform that there was always a sunny weather in Los Angeles. As an option, we could also add "other" to show the percentage of ignored weather types.
Taking into account that some items may appear with equal frequency, I suggest this code to try:
def weather_nlargest(df, n=3, keep='all'):
"n, keep: see help('pandas.Series.nlargest')"
return (
df
.groupby(['City'])['Weather']
.apply(lambda x:
pd.concat([
_:=x.value_counts(normalize=True).nlargest(n, keep),
pd.Series({'other': 1 - _.sum()})
])
)
)
def print_percentage(df):
print(df.to_string(float_format='{:.0%}'.format))
df['Weather'] = df['Weather'].str.lower() # sunny == Sunny, rain == Rain
print_percentage(weather_nlargest(df))
Output:
City
Austin sunny 28%
partly cloudy 28%
cloudy 22%
rain 22%
other 0%
Los Angeles sunny 100%
other 0%
New York sunny 35%
rain 24%
cloudy 18%
partly cloudy 18%
other 6%
Code to see no more then 3 weather types:
print_percentage(weather_nlargest(df, 3, 'first'))
Output:
City
Austin sunny 28%
partly cloudy 28%
cloudy 22%
other 22%
Los Angeles sunny 100%
other 0%
New York sunny 35%
rain 24%
cloudy 18%
other 24%