Code outputs top 3 most frequent weather description strings for each city, but want to add message-CodePudding

I am currently trying to improve my python as I have a very good grip on actual data analysis but I am trying to start creating functions which other people can run to return results and the code will also give out informative messages to the user. Below is a simple dataset I am using print the top 3 "Weather" descriptions for each city, but as you can see for Los Angeles it only has the one description.

          City        Weather
0      New York          Sunny
1      New York           Rain
2      New York         cloudy
3      New York           Rain
4      New York          Sunny
5      New York          Sunny
6      New York  partly cloudy
7      New York   thunderstorm
8      New York           Rain
9      New York         cloudy
10     New York          sunny
11     New York  partly cloudy
12     New York  partly cloudy
13     New York         cloudy
14     New York          sunny
15     New York          sunny
16     New York           rain
17       Austin           rain
18       Austin           rain
19       Austin         cloudy
20       Austin          sunny
21       Austin           rain
22       Austin  partly cloudy
23       Austin  partly cloudy
24       Austin  partly cloudy
25       Austin          Sunny
26       Austin         cloudy
27       Austin          Sunny
28       Austin          Sunny
29       Austin         cloudy
30       Austin         cloudy
31       Austin  partly cloudy
32       Austin  partly cloudy
33       Austin          Sunny
34       Austin           rain
35  Los Angeles          Sunny
36  Los Angeles          Sunny
37  Los Angeles          Sunny
38  Los Angeles          Sunny
39  Los Angeles          Sunny
40  Los Angeles          Sunny
41  Los Angeles          Sunny
42  Los Angeles          Sunny
43  Los Angeles          Sunny
44  Los Angeles          Sunny
45  Los Angeles          Sunny
46  Los Angeles          Sunny
47  Los Angeles          Sunny
48  Los Angeles          Sunny
49  Los Angeles          Sunny
50  Los Angeles          Sunny
51  Los Angeles          Sunny
52  Los Angeles          Sunny

I have created a function to output the values for each city, in my own line of work this would be fine as I could do a few checks on the data but for others they would need to be informed that for Los Angeles, top 3 could not be given as there is only one weather description. I have tried using IF statements with value counts but I keep getting error messages like ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). and do not think my method is correct, it is very difficult to find examples for these kind of problems. Any guidance or even links that could help would be appreciated!

def weather_valuecount(df):
  weather_valcount= df.groupby(['City']).Weather.value_counts().groupby(level=0, group_keys=False).head(3)

  return weather_valcount

When I run the above I get the following results:

City         Weather      
Austin       partly cloudy     5
             Sunny             4
             cloudy            4
Los Angeles  Sunny            18
New York     Rain              3
             Sunny             3
             cloudy            3
Name: Weather, dtype: int64

Which shows the top 3 description counts for each city, but Los Angeles only shows one, which I'd like to include a user message in the function to say something along the lines of "Cannot show top three unique Weather descriptions and count for Los Angeles as there is not 3 unique values available".

CodePudding user response：

Check out this post who explains why you're getting Truth value of a Series is ambiguous

And regarding your question, I'm not sure I understand the expected output.

See the code/results below (with the consideration that df is the dataframe that holds your dataset) :

listOfCites = set(df['City'])

def show_top3_weather(df):
    df1 = df.groupby('City').head(3).reset_index(drop=True).assign()
    df2 = df1.drop_duplicates().groupby('City', as_index=False).count().rename(columns={'Weather':'WeatherOccu'})
    df3 = df1.merge(df2, on='City', how='left').drop_duplicates()

    city_name = input("Choose a city: ")
    
    if city_name in listOfCites:
        if (df3.loc[df3.City == city_name]['WeatherOccu'] == 3).any():
            print(f"Below, the top three weathers of {city_name}:")
            print(df3[df3.City == city_name][['City', 'Weather']])
        else:
            print(f"{city_name} has not three different weathers!")
    else:
        print(f"{city_name} doesn't exist!")

`>>> show_top3_weather(df)`

with New York as input

Choose a city:  New York
Below, the top three weathers of New York:
       City Weather
0  New York   Sunny
1  New York    Rain
2  New York  cloudy

with Austin as input

Choose a city:  Austin
Austin has not three different weathers!

with Los Angeles as input

Choose a city:  Los Angeles
Los Angeles has not three different weathers!

CodePudding user response：

We can print percentage to inform that there was always a sunny weather in Los Angeles. As an option, we could also add "other" to show the percentage of ignored weather types.

Taking into account that some items may appear with equal frequency, I suggest this code to try:

def weather_nlargest(df, n=3, keep='all'):
    "n, keep: see help('pandas.Series.nlargest')"
    return (
        df
        .groupby(['City'])['Weather']
        .apply(lambda x: 
            pd.concat([
                _:=x.value_counts(normalize=True).nlargest(n, keep),
                pd.Series({'other': 1 - _.sum()})
            ])
        )
    )


def print_percentage(df):
    print(df.to_string(float_format='{:.0%}'.format))

    
df['Weather'] = df['Weather'].str.lower()   # sunny == Sunny, rain == Rain
print_percentage(weather_nlargest(df))

Output:

City                      
Austin       sunny            28%
             partly cloudy    28%
             cloudy           22%
             rain             22%
             other             0%
Los Angeles  sunny           100%
             other             0%
New York     sunny            35%
             rain             24%
             cloudy           18%
             partly cloudy    18%
             other             6%

Code to see no more then 3 weather types:

print_percentage(weather_nlargest(df, 3, 'first'))

Output:

City                      
Austin       sunny            28%
             partly cloudy    28%
             cloudy           22%
             other            22%
Los Angeles  sunny           100%
             other             0%
New York     sunny            35%
             rain             24%
             cloudy           18%
             other            24%