I have a dataframe with in column "A" locations and in column "B" values. Locations occure multiple times in this DataFrame, now i'd like to add a third column in which i store the average value of column "B" that have the same location value in column "A".
-I know the .mean() can be used to get an average
-I know how to filter with .loc()
I could make a list of all unique values in column A, and compute the average for all of them by making a for loop. Hover, this seems combersome to me. Any idea how this can be done more efficiently?
CodePudding user response:
Sounds like what you need is GroupBy. Take a look here
Given
df = pd.DataFrame({'A': [1, 1, 2, 1, 2],
'B': [np.nan, 2, 3, 4, 5],
'C': [1, 2, 1, 1, 2]}, columns=['A', 'B', 'C'])
You can use
df.groupby('A').mean()
to group the values based on the common values in column "A" and find the mean.
Output:
B C
A
1 3.0 1.333333
2 4.0 1.500000
CodePudding user response:
I could make a list of all unique values in column A, and compute the average for all of them by making a for loop.
This can be done using pandas.DataFrame.groupby
consider following simple example
import pandas as pd
df = pd.DataFrame({"A":["X","Y","Y","X","X"],"B":[1,3,7,10,20]})
means = df.groupby('A').agg('mean')
print(means)
gives output
B
A
X 10.333333
Y 5.000000
CodePudding user response:
import pandas as pd
data = {'A': ['a', 'a', 'b', 'c'], 'B': [32, 61, 40, 45]}
df = pd.DataFrame(data)
df2 = df.groupby(['A']).mean()
print(df2)