Is there any way to select certain rows in a table based on which ones have similar values in one or-CodePudding

Sr. No.	A	B	C
0	84.3	18.3	1.138420e 00
1	84.3	95.8	8.501307e
2	84.3	192.7	2.262742e-02
3	84.3	617.0	5.395847e-01
4	84.3	54.0	1.484681
5	18.3	95.8	9.612692e-01
6	18.3	192.7	9.600000e-01
7	18.3	617.0	1.706984e
8	18.3	544.0	1.128933e 00
9	95.8	52.7	6.157143e-01
10	95.8	617.0	8.880000e 00
11	95.8	54.0	4.533847e-01
12	192.7	617.0	5.048742e
13	192.7	544.0	1.838478e-02
14	617.0	544.0	7.360492e

e.g. In the table above, I want to take an average of C values from rows 0,5,6,7,8 because all of these rows have 18.3 in some of the columns. Then I want to store this average in another data frame in a row corresponding to '18.3'. Then, I want to take an average of C values from rows 1,5,9,10,11 because all these rows have 95.8 in some of the columns. Then I want to store this average in another data frame in a row corresponding to '95.8'. Similarly, I want to repeat this operation for each unique value that appears in columns A and B. I am unable to figure out a way to do this. Any hits will be helpful!

CodePudding user response：

I believe I understand what your asking, you want to store the mean of column C for each unique value in A and B as a row in a new df.

The below code reads in the dataframe, which I created as data.csv, then finds the Unique values between the A and B columns calculates the mean of C where the rows of either A or B match the unique value.

We then create a new data frame with the mean and the unique value.

    import pandas as pd


    df = pd.read_csv("data.csv")


    unique_a = df.A.unique().tolist()
    unique_b = df.B.unique().tolist()
    b_uniques = [ i for  i in unique_b if i not in unique_a]

    unique_a  = b_uniques

    output = []
    value = []
    for i in unique_a:
        output.append( df[(df['A']==i) | (df['B']==i)]['C'].mean())
        value.append(i)




    out_df = pd.DataFrame({"mean":output, "Group Value": value})


    output:
      mean  Group Value
0  2.336000         84.3
1  1.180000         18.3
2  3.882000         95.8
3  1.512500        192.7
4  4.708000        617.0
5  0.965000         54.0
6  2.836667        544.0
7  0.620000         52.7

CodePudding user response：

Try this:

import pandas as pd

s = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
a = [84.3, 84.3, 84.3, 84.3, 84.3, 18.3, 18.3, 18.3, 18.3, 95.8, 95.8, 95.8, 192.7, 192.7, 617.0]
b = [18.3, 95.8, 192.7, 617.0, 54.0, 95.8, 192.7, 617.0, 544.0, 52.7, 617.0, 54.0, 617.0, 544.0, 544.0]
c = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150]

df = pd.DataFrame(s, columns=['Sr. No.'])
df['A'] = a
df['B'] = b
df['C'] = c

completeSet = set(list(df['B'])   list(df['A']))
list_df_num = []
list_df_avg = []

for num in completeSet:
    list_df_num.append(num)
    tmp = df[(df['A'] == num) | (df['B'] == num)]
    if len(tmp) > 0:
        avg = sum(list(tmp['C'])) / len(list(tmp['C']))
        list_df_avg.append(avg)
    else:
        list_df_avg.append(0)

result = pd.DataFrame(list_df_num, columns=['Number'])
result['Average'] = list_df_avg

print(result)