python pandas column with averages-CodePudding

I have a dataframe with in column "A" locations and in column "B" values. Locations occure multiple times in this DataFrame, now i'd like to add a third column in which i store the average value of column "B" that have the same location value in column "A".

-I know the .mean() can be used to get an average

-I know how to filter with .loc()

I could make a list of all unique values in column A, and compute the average for all of them by making a for loop. Hover, this seems combersome to me. Any idea how this can be done more efficiently?

CodePudding user response：

Sounds like what you need is GroupBy. Take a look here

Given

df = pd.DataFrame({'A': [1, 1, 2, 1, 2],
                   'B': [np.nan, 2, 3, 4, 5],
                   'C': [1, 2, 1, 1, 2]}, columns=['A', 'B', 'C'])

You can use

df.groupby('A').mean()

to group the values based on the common values in column "A" and find the mean.

Output:

     B         C
A
1  3.0  1.333333
2  4.0  1.500000

CodePudding user response：

I could make a list of all unique values in column A, and compute the average for all of them by making a for loop.

This can be done using pandas.DataFrame.groupby consider following simple example

import pandas as pd
df = pd.DataFrame({"A":["X","Y","Y","X","X"],"B":[1,3,7,10,20]})
means = df.groupby('A').agg('mean')
print(means)

gives output

           B
A
X  10.333333
Y   5.000000

CodePudding user response：

import pandas as pd

data = {'A': ['a', 'a', 'b', 'c'], 'B': [32, 61, 40, 45]}
df = pd.DataFrame(data)

df2 = df.groupby(['A']).mean()
print(df2)