Iterate dataframe and sum transactions by condition-CodePudding

I have the following sample of data:

    id   year  type  num
     1   1994   A     0
     2   1950   A  2333
     3   1977   B  4444
     4   1995   B   555
     1   1994   A     0
     6   1955   A   333
     7   2006   B  4123
     6   1975   A     0
     9   1999   B   123
     3   1950   A  1234

I'm looking for the easiest way how to sum column 'num' based on conditions of type == 'A' and year < 1999

I'm iterating through the dataframe df with the data:

    data = pd.read_csv('data.csv')
    df = pd.DataFrame(data)
    df_sum = pd.DataFrame
    
    for index, row in df.iterrows():
        if row['type'] == 'A' and row['year'] < 1999:
            df_sum = df_sum.append(row) //This doesn't work

and trying to store the rows that match the conditions into df_sum where I'd make the sumarized num by id. Have no idea how to iterate and store the data based on condition into new dataframe.

The desired output would be:

id num_sum
1   0
2   2333
6   333
.....

CodePudding user response：

Let's use where to select the rows in num where type equals A and year is less than 1999 then group the selected rows by id and agg with sum

m = df['type'].eq('A') & df['year'].lt(1999)
df['num'].where(m).groupby(df['id']).sum()

id
1       0.0
2    2333.0
3    1234.0
4       0.0
6     333.0
7       0.0
9       0.0
Name: num, dtype: float64

CodePudding user response：

You can use df.query() to accomplish that.

filtered_df = df.query('type == "A" and year < 1999')
sum_df = filtered_df.groupby("id")["sum_num"].sum().reset_index()
print(sum_df)

Output:

   id  sum_num
0   1        0
1   2     2333
2   3     1234
3   6      333