Home > Blockchain >  list mean values of all possible column combinations
list mean values of all possible column combinations

Time:02-16

I have a dataset that looks like this:

Value         Type       country   mean
-1.975767     Weather    Brazil
-0.540979     Fruits     China
-2.359127     Fruits     China
-2.815604     Corona     China
-0.712323     Weather    UK
-0.929755     Weather    Brazil

I want to calculate an overall mean value for all different combinations of Type & Country. For example:

the mean of Weather, Brazil would be = (-1.975767 -0.929755) / 2

Then I want to add these combinations to another table:

df2 = pd.DataFrame()
country    type       mean     count
Brazil     Weather             2
Brazil     Corona              
China      Corona              1
China      Fruits              2

I can calculate the mean like this:

print(df.groupby(["type", "country"])["value"].mean())

but how do I save these values in a new df in the desired format?

Edit: This works

df_new = df.groupby(["type", "country"], as_index = False)["value"].mean()

but then if I try to add a count in the same way:

df_new = df.groupby(["type", "country"], as_index = False).count()

it transposes all the cols instead of adding a count col after the mean col

CodePudding user response:

You can use as_index parameter in your groupby:

df_new = df.groupby(["type", "country"], as_index = False)["value"].mean()

Then the result is a standard dataframe:

    type    country value
0   Corona  China   -2.815604
1   Fruits  China   -1.450053
2   Weather Brazil  -1.452761
3   Weather UK  -0.712323

EDIT: How can we add another column with the count? You can simply append a new column with the result of a new groupby like this:

# original answer
df_new = df.groupby(["type", "country"], as_index = False)["value"].mean().rename(columns={'value':'mean'})

# Add count also
df_new['count'] = df.groupby(["type", "country"])["value"].count().tolist()
df_new

Output:

type    country mean    count
0   Corona  China   -2.815604   1
1   Fruits  China   -1.450053   2
2   Weather Brazil  -1.452761   2
3   Weather UK  -0.712323   1

CodePudding user response:

Personally speaking, I prefer JANO's answer, but here is what I came up with:

import pandas as pd
dataframe = pd.DataFrame({"Value":[-1.23, -1.65, -0.123, -0.67, 2.456], "Type":["Weather", "Fruits", "Corona", "Corona", "Weather"], "country": ["Brazil", "China", "China", "Iran", "Iran"]})
resultDataframe = {"Value":[], "Type":[], "country":[]}
for country in dataframe["country"].unique():
  tempDataframe = dataframe[dataframe["country"] == country]
  a = tempDataframe.groupby(by="Type").mean().reset_index()
  for index, row in a.iterrows():
    resultDataframe["Value"].append(row["Value"])
    resultDataframe["Type"].append(row["Type"])
    resultDataframe["country"].append(country)
pd.DataFrame(resultDataframe)
Value Type country
0 -1.23 Weather Brazil
1 -0.123 Corona China
2 -1.65 Fruits China
3 -0.67 Corona Iran
4 2.456 Weather Iran
  • Related