Context: I'm trying to get the sum of the groups created using a groupby using a list of patterns that are present on the dataframe columns.
For example, let's say we have this dataframe:
df = pd.DataFrame({'123_Pattern1_a':[0,1,2],'X_Y_Pattern2_X':[3,4,5],'Z_D_Pattern2_Y':[4,5,7],'312_Pattern1_Z':[8,2,4]})
I now would like to create a group by using the "Pattern" and get the sum of values for those columns for each row
If we have a list like this:
pattern = ['Pattern1','Pattern2']
With the dataframe above, the output should be another dataframe as such:
df_final = pd.DataFrame({'Pattern1':[8,3,6],'Pattern2':[7,9,12]})
Basically, "concatenating" all the columns that have a specific pattern on the given column name and get the sum of these values by row
I was trying something like this:
pattern = ['Pattern1','Pattern2','Pattern3',...]
grouped = pd.DataFrame(data_media.groupby(data_media.columns.str.extract(pattern, expand=False), axis=1))
But it doesn't work since extract is a regex and I'm using a list with the patterns. How could I create a regex that would work for this problem? Or is there another way to do this?
Thank you!
CodePudding user response:
Using melt
and pivot_table
:
pattern = ['Pattern1','Pattern2']
df_final = (df
.reset_index().melt('index')
.assign(variable=lambda d: d['variable'].str.extract(fr'({"|".join(pattern)})'))
.pivot_table(index='index', columns='variable', values='value', aggfunc='sum')
)
One option using wide_to_long
and groupby.sum
(works with previous example before OP update):
pattern = ['Pattern1','Pattern2']
df_final = (pd
.wide_to_long(df.reset_index(), stubnames=pattern, i='index', j='x',
sep='_', suffix='. ')
.groupby(level=0).sum()
)
output:
Pattern1 Pattern2
index
0 8.0 7.0
1 3.0 9.0
2 6.0 12.0