Home > OS >  How to calculate relative frequency of an event from a dataframe?
How to calculate relative frequency of an event from a dataframe?

Time:11-01

I have a dataframe with temperature data for a certain period. With this data, I want to calculate the relative frequency of the month of August being warmer than 20° as well as January being colder than 2°. I have already managed to extract these two columns in a separate dataframe, to get the count of each temperature event and used the normalize function to get the frequency for each value in percent (see code).

df_temp1[df_temp1.aug >=20]
df_temp1[df_temp1.jan <= 2]

df_temp1['aug'].value_counts()
df_temp1['jan'].value_counts()

df_temp1['aug'].value_counts(normalize=True)*100
df_temp1['jan'].value_counts(normalize=True)*100

What I haven't managed is to calculate the relative frequency for aug>=20, jan<=2, as well as aug>=20 AND jan<=2 and aug>=20 OR jan<=2. Maybe someone could help me with this problem. Thanks.

CodePudding user response:

I would try something like this:

proprortion_of_augusts_above_20 = (df_temp1['aug'] >= 20).mean()
proprortion_of_januaries_below_20 = (df_temp1['jan'] <= 2).mean()

This calculates it in two steps. First, df_temp1['aug'] >= 20 creates a boolean array, with True representing months above 20, and False representing months which are not.

Then, mean() reinterprets True and False as 1 and 0. The average of this is the percentage of months which fulfill the criteria, divided by 100.

As an aside, I would recommend posting your data in a question, which allows people answering to check whether their solution works.

  • Related