Group questions based on a regex pattern and aggregate scores using pandas-CodePudding

I'm trying to group aggregate values of feedback based on their characteristics. For example, the below code creates a data frame which collects information from individuals and their feedback scores for questions referring to particular yearly touch-points.

import pandas as pd
import numpy as np
dummydf = pd.DataFrame({'ID': [2,15,32,4,9,12,16,10,3,7],
              '1-year feedback qs A': [3,2,3,4,3,2,1,3,4,5],
              '1-year feedback qs B': [1,1,2,4,np.NaN,3,3,3,2,5],
              '2-year feedback qs A': [2,2,3,4,3,5,3,2,2,4],
              '2-year feedback qs B': [2,3,3,3,4,5,3,np.NaN,5,5],
              'Gender': [0,0,0,1,0,1,1,0,0,1],
              'Location': ['py','py','py','va','jk','ce','ce','va','jk','jk']})
print(dummydf)

For each ID I need to group the values of the 1-year questions together as a mean aggregated score, 2-year question touch-points together and so on, at the same time keeping the rest of the variables intact. What is the best way of achieving the result?

What I tried is -

groups = dummydf.groupby(by=['ID'])
groups.apply(lambda g: g[g.filter(regex='1-') == g.filter(regex='1-').mean()])

which is not giving me the desired result

CodePudding user response：

Since ID is unique, you don't need to group them. You can just use:

for i in range(1,2):
  dummydf['mean_year_' str(i)] = dummydf[[x for x in dummydf.columns if str(i) in x]].mean(axis=1)