Home > other >  After subsetting a dataframe, how to add a column that will sum only specific columns in the subset
After subsetting a dataframe, how to add a column that will sum only specific columns in the subset

Time:01-22

I've searched a lot, including in suggested past questions, and didn't find an answer. Coming from R and tidyverse and new to Python and pandas.

I want to add a column to a subset of a dataframe that will sum specific columns row wise.

I understand how to do it in multiple steps, but I'm wondering if it's possible to do that in one "go", as close as possible to the tidyverse piping in R.

Here's what I've tried:

import pandas as pd

# Create data frame
df = pd.DataFrame ({
"first name": ["A", "B", "C"],
    "last name": ["X", "Y", "Z"],
    "age": [30, 40, 50],
    "Score1": [1, 2, 3],
    "Score2": [4, 5, 6]
})

# Subset and then sum only Scores columns
df.loc[~(df["first name"] == "C")]\
    .assign(Total = lambda x: x.sum(axis=1))

This sums all numeric columns into the Total column.

But how do I sum only the "Score1" and "Score2" columns while still having all the other columns that I didn't sum (even if they are numeric, like "age" column) in the view?

Thank you in advance.

CodePudding user response:

You could select your columns inside lambda function:

df.loc[~(df["first name"] == "C")]\
    .assign(Total = lambda x: x[["Score1", "Score2"]].sum(axis=1))

Of course you can use more than one line to filter and sum

df2 = df.loc[~(df["first name"] == "C")]
df2['Total'] = df2[['Score1', 'Score2']].sum(axis=1)
#df2['Total'] = df2['Score1'].add(df2['Score2'])

CodePudding user response:

You need to sum the two columns and can do so by creating a new column

df['Total'] = df['Score1']   df['Score2']
  • Related