I have a dataframe with more approximately 200 rows. The dataframe looks like below
Rule A | Rule C | Rule N | Rule A.1 | Rule C.1 | Rule N.1 |
---|---|---|---|---|---|
4 | 4 | 4 | 1 | 1 | 1 |
5 | 5 | 5 | 1 | 1 | 1 |
6 | 6 | 6 | 0 | 1 | 1 |
I am trying to calculate averages for each rule(A,C...N) when values in Rule A.1/C.1...N.1 is 1. The output would look like:-
Rule | Average |
---|---|
A | 4.5 |
C | 5 |
............. | ...... |
N | 5 |
CodePudding user response:
here is one way to accomplish it
The solution is based on the provided sample, i.e. 6 columns. the number 3 here would need to expand/shrink based on the number of columns you'll have for value and its assumed that there will same number of columns for the rule
pd.DataFrame(np.multiply(
df.iloc[:,3:].to_numpy(),
df.iloc[:,0:3].to_numpy()),
columns=df.columns[:3]).replace(0,np.nan).mean()
average
Rule A 4.5
Rule C 5.0
Rule N 5.0
CodePudding user response:
Here is another solution. Albeit, not the most elegant, but it works.
I am assuming that the dataframe "df" here is the sample dataframe that you provided.
import pandas as pd
import numpy as np
df = pd.DataFrame(data=[[4,1,4,1,4,1], [5,1,5,1,5,1], [6,0,6,1,6,1]],
columns=['Rule A', 'Rule A.1', 'Rule C', 'Rule C.1', 'Rule N', 'Rule N.1'])
for col in df.columns:
if ".1" in col:
df[col] = df[col].replace(1, True)
df[col] = df[col].replace(0, False)
df_transposed = df.transpose().rename_axis("Rule").reset_index()
num_rules = len([r for r in df.columns if "Rule" in r])
rules = []
averages = []
for r in range(num_rules):
if r%2 == 0:
rule = df_transposed.iloc[r, 0]
mask = df_transposed.iloc[r 1, 1:]
filtered_vals = df_transposed.iloc[r, 1:].loc[mask]
avg = np.mean(filtered_vals.values)
rules.append(rule)
averages.append(avg)
df_new = pd.DataFrame(data=zip(rules, averages), columns=['Rules', 'Average'])
df_new
The other assumption in this solution is that the columns must be ordered in the way you suggested in your post, i.e. A, A.1, C, C.1 etc. (alternating between the rule and the boolean flag). This solution will work for any number of "Rules" as long as they follow this alternating column convention.