Home > Back-end >  grouping and printing the maximum in a dataframe in python
grouping and printing the maximum in a dataframe in python

Time:11-17

A dataframe has 3 Columns

    A                   B          C      
^0hand(%s)leg$        27;30       42;54
^-(%s)hand0leg        39;30       47;57
^0hand(%s)leg$        24;33       39;54

So column A has regex patterns like this if those patterns are similar for example now row 1 and row 3 is similar so it has to merge the two rows and output only the maximum as below:

Output:

 A                   B          C      
^0hand(%s)leg$        27;33       42;54
^-(%s)hand0leg        39;30       47;57

Any leads will be helpful

CodePudding user response:

You could use:

(df.set_index('A').stack()
   .str.extract('(\d );(\d )').astype(int)
   .groupby(level=[0,1]).agg(max).astype(str)
   .assign(s=lambda d: d[0] ';' d[1])['s']    # OR # .apply(';'.join, axis=1)
   .unstack(1)
   .loc[df['A'].unique()]  ## only if the order of rows matters
   .reset_index()
)

output:

                A      B      C
0  ^0hand(%s)leg$  27;33  42;54
1  ^-(%s)hand0leg  39;30  47;57
  • Related