Groupby multiple columns and get the sum of two other columns while keeping the first occurrence of-CodePudding

I will try to be right to the point. So I have a df with 100 columns, and I want to groupby two of them while getting the sum of another 2 columns. For that matter I already used the groupby agg function. The problem is that while doing that I still want to keep the remaining 96 columns and by that I'd like to keep the first occurrence of each value for those 96. For that I was thinking something such as dropping duplicates and keep = 'first'.

I searched for a method that could do that in one action but at this point I'm open to any recommendations you might have.

Note: I didn't include a sample df and a desired output for a reason as I'd like to try it myself based on a recommendation, I'm not trying to get a solution directly from someone.

Thanks in advance,

CodePudding user response：

I think that using two separate operations on the groupby object and join them afterwards is clearer than a one-liner. Here is a minimal example, grouping on 1 column:

df = pd.DataFrame(
    [
        ("bird", "Falconiformes", 389.0, 5.5, 1),
        ("bird", "Psittaciformes", 24.0, 4.5, 2),
        ("mammal", "Carnivora", 80.2, 33.3, 1),
        ("mammal", "Primates", np.nan, 33.7, 2),
        ("mammal", "Carnivora", 58, 23, 3),
    ],
    index=["falcon", "parrot", "lion", "monkey", "leopard"],
    columns=("class", "family", "max_speed", "height", "order"),
)
print(df, "\n")

grouped = df.groupby('class')
df_sum = grouped[['max_speed', 'height']].agg(sum)
df_first = grouped['order'].first()
df_out = pd.concat([df_sum, df_first], axis=1)
print(df_out)

Output:

          class          family  max_speed  height  order
falcon     bird   Falconiformes      389.0     5.5      1
parrot     bird  Psittaciformes       24.0     4.5      2
lion     mammal       Carnivora       80.2    33.3      1
monkey   mammal        Primates        NaN    33.7      2
leopard  mammal       Carnivora       58.0    23.0      3 

        max_speed  height  order
class                           
bird        413.0    10.0      1
mammal      138.2    90.0      1