Sr. No. | A | B | C |
---|---|---|---|
0 | 84.3 | 18.3 | 1.138420e 00 |
1 | 84.3 | 95.8 | 8.501307e |
2 | 84.3 | 192.7 | 2.262742e-02 |
3 | 84.3 | 617.0 | 5.395847e-01 |
4 | 84.3 | 54.0 | 1.484681 |
5 | 18.3 | 95.8 | 9.612692e-01 |
6 | 18.3 | 192.7 | 9.600000e-01 |
7 | 18.3 | 617.0 | 1.706984e |
8 | 18.3 | 544.0 | 1.128933e 00 |
9 | 95.8 | 52.7 | 6.157143e-01 |
10 | 95.8 | 617.0 | 8.880000e 00 |
11 | 95.8 | 54.0 | 4.533847e-01 |
12 | 192.7 | 617.0 | 5.048742e |
13 | 192.7 | 544.0 | 1.838478e-02 |
14 | 617.0 | 544.0 | 7.360492e |
e.g. In the table above, I want to take an average of C values from rows 0,5,6,7,8 because all of these rows have 18.3 in some of the columns. Then I want to store this average in another data frame in a row corresponding to '18.3'. Then, I want to take an average of C values from rows 1,5,9,10,11 because all these rows have 95.8 in some of the columns. Then I want to store this average in another data frame in a row corresponding to '95.8'. Similarly, I want to repeat this operation for each unique value that appears in columns A and B. I am unable to figure out a way to do this. Any hits will be helpful!
CodePudding user response:
I believe I understand what your asking, you want to store the mean of column C for each unique value in A and B as a row in a new df.
The below code reads in the dataframe, which I created as data.csv, then finds the Unique values between the A and B columns calculates the mean of C where the rows of either A or B match the unique value.
We then create a new data frame with the mean and the unique value.
import pandas as pd
df = pd.read_csv("data.csv")
unique_a = df.A.unique().tolist()
unique_b = df.B.unique().tolist()
b_uniques = [ i for i in unique_b if i not in unique_a]
unique_a = b_uniques
output = []
value = []
for i in unique_a:
output.append( df[(df['A']==i) | (df['B']==i)]['C'].mean())
value.append(i)
out_df = pd.DataFrame({"mean":output, "Group Value": value})
output:
mean Group Value
0 2.336000 84.3
1 1.180000 18.3
2 3.882000 95.8
3 1.512500 192.7
4 4.708000 617.0
5 0.965000 54.0
6 2.836667 544.0
7 0.620000 52.7
CodePudding user response:
Try this:
import pandas as pd
s = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
a = [84.3, 84.3, 84.3, 84.3, 84.3, 18.3, 18.3, 18.3, 18.3, 95.8, 95.8, 95.8, 192.7, 192.7, 617.0]
b = [18.3, 95.8, 192.7, 617.0, 54.0, 95.8, 192.7, 617.0, 544.0, 52.7, 617.0, 54.0, 617.0, 544.0, 544.0]
c = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150]
df = pd.DataFrame(s, columns=['Sr. No.'])
df['A'] = a
df['B'] = b
df['C'] = c
completeSet = set(list(df['B']) list(df['A']))
list_df_num = []
list_df_avg = []
for num in completeSet:
list_df_num.append(num)
tmp = df[(df['A'] == num) | (df['B'] == num)]
if len(tmp) > 0:
avg = sum(list(tmp['C'])) / len(list(tmp['C']))
list_df_avg.append(avg)
else:
list_df_avg.append(0)
result = pd.DataFrame(list_df_num, columns=['Number'])
result['Average'] = list_df_avg
print(result)