I have a dataset that looks like this:
Value Type country mean
-1.975767 Weather Brazil
-0.540979 Fruits China
-2.359127 Fruits China
-2.815604 Corona China
-0.712323 Weather UK
-0.929755 Weather Brazil
I want to calculate an overall mean value for all different combinations of Type & Country. For example:
the mean of Weather, Brazil would be = (-1.975767 -0.929755) / 2
Then I want to add these combinations to another table:
df2 = pd.DataFrame()
country type mean count
Brazil Weather 2
Brazil Corona
China Corona 1
China Fruits 2
I can calculate the mean like this:
print(df.groupby(["type", "country"])["value"].mean())
but how do I save these values in a new df in the desired format?
Edit: This works
df_new = df.groupby(["type", "country"], as_index = False)["value"].mean()
but then if I try to add a count in the same way:
df_new = df.groupby(["type", "country"], as_index = False).count()
it transposes all the cols instead of adding a count col after the mean col
CodePudding user response:
You can use as_index
parameter in your groupby
:
df_new = df.groupby(["type", "country"], as_index = False)["value"].mean()
Then the result is a standard dataframe:
type country value
0 Corona China -2.815604
1 Fruits China -1.450053
2 Weather Brazil -1.452761
3 Weather UK -0.712323
EDIT: How can we add another column with the count
? You can simply append a new column with the result of a new groupby
like this:
# original answer
df_new = df.groupby(["type", "country"], as_index = False)["value"].mean().rename(columns={'value':'mean'})
# Add count also
df_new['count'] = df.groupby(["type", "country"])["value"].count().tolist()
df_new
Output:
type country mean count
0 Corona China -2.815604 1
1 Fruits China -1.450053 2
2 Weather Brazil -1.452761 2
3 Weather UK -0.712323 1
CodePudding user response:
Personally speaking, I prefer JANO's answer, but here is what I came up with:
import pandas as pd
dataframe = pd.DataFrame({"Value":[-1.23, -1.65, -0.123, -0.67, 2.456], "Type":["Weather", "Fruits", "Corona", "Corona", "Weather"], "country": ["Brazil", "China", "China", "Iran", "Iran"]})
resultDataframe = {"Value":[], "Type":[], "country":[]}
for country in dataframe["country"].unique():
tempDataframe = dataframe[dataframe["country"] == country]
a = tempDataframe.groupby(by="Type").mean().reset_index()
for index, row in a.iterrows():
resultDataframe["Value"].append(row["Value"])
resultDataframe["Type"].append(row["Type"])
resultDataframe["country"].append(country)
pd.DataFrame(resultDataframe)
Value | Type | country | |
---|---|---|---|
0 | -1.23 | Weather | Brazil |
1 | -0.123 | Corona | China |
2 | -1.65 | Fruits | China |
3 | -0.67 | Corona | Iran |
4 | 2.456 | Weather | Iran |