I have a dataframe with temperature data for a certain period. With this data, I want to calculate the relative frequency of the month of August being warmer than 20° as well as January being colder than 2°. I have already managed to extract these two columns in a separate dataframe, to get the count of each temperature event and used the normalize function to get the frequency for each value in percent (see code).
df_temp1[df_temp1.aug >=20]
df_temp1[df_temp1.jan <= 2]
df_temp1['aug'].value_counts()
df_temp1['jan'].value_counts()
df_temp1['aug'].value_counts(normalize=True)*100
df_temp1['jan'].value_counts(normalize=True)*100
What I haven't managed is to calculate the relative frequency for aug>=20, jan<=2, as well as aug>=20 AND jan<=2 and aug>=20 OR jan<=2. Maybe someone could help me with this problem. Thanks.
CodePudding user response:
I would try something like this:
proprortion_of_augusts_above_20 = (df_temp1['aug'] >= 20).mean()
proprortion_of_januaries_below_20 = (df_temp1['jan'] <= 2).mean()
This calculates it in two steps. First, df_temp1['aug'] >= 20
creates a boolean array, with True representing months above 20, and False representing months which are not.
Then, mean() reinterprets True and False as 1 and 0. The average of this is the percentage of months which fulfill the criteria, divided by 100.
As an aside, I would recommend posting your data in a question, which allows people answering to check whether their solution works.