Home > Software engineering >  Groupby multiple columns and get the sum of two other columns while keeping the first occurrence of
Groupby multiple columns and get the sum of two other columns while keeping the first occurrence of

Time:03-24

I will try to be right to the point. So I have a df with 100 columns, and I want to groupby two of them while getting the sum of another 2 columns. For that matter I already used the groupby agg function. The problem is that while doing that I still want to keep the remaining 96 columns and by that I'd like to keep the first occurrence of each value for those 96. For that I was thinking something such as dropping duplicates and keep = 'first'.

I searched for a method that could do that in one action but at this point I'm open to any recommendations you might have.

Note: I didn't include a sample df and a desired output for a reason as I'd like to try it myself based on a recommendation, I'm not trying to get a solution directly from someone.

Thanks in advance,

CodePudding user response:

I think that using two separate operations on the groupby object and join them afterwards is clearer than a one-liner. Here is a minimal example, grouping on 1 column:

df = pd.DataFrame(
    [
        ("bird", "Falconiformes", 389.0, 5.5, 1),
        ("bird", "Psittaciformes", 24.0, 4.5, 2),
        ("mammal", "Carnivora", 80.2, 33.3, 1),
        ("mammal", "Primates", np.nan, 33.7, 2),
        ("mammal", "Carnivora", 58, 23, 3),
    ],
    index=["falcon", "parrot", "lion", "monkey", "leopard"],
    columns=("class", "family", "max_speed", "height", "order"),
)
print(df, "\n")

grouped = df.groupby('class')
df_sum = grouped[['max_speed', 'height']].agg(sum)
df_first = grouped['order'].first()
df_out = pd.concat([df_sum, df_first], axis=1)
print(df_out)

Output:

          class          family  max_speed  height  order
falcon     bird   Falconiformes      389.0     5.5      1
parrot     bird  Psittaciformes       24.0     4.5      2
lion     mammal       Carnivora       80.2    33.3      1
monkey   mammal        Primates        NaN    33.7      2
leopard  mammal       Carnivora       58.0    23.0      3 

        max_speed  height  order
class                           
bird        413.0    10.0      1
mammal      138.2    90.0      1

  • Related