I have the following sample of data:
id year type num
1 1994 A 0
2 1950 A 2333
3 1977 B 4444
4 1995 B 555
1 1994 A 0
6 1955 A 333
7 2006 B 4123
6 1975 A 0
9 1999 B 123
3 1950 A 1234
I'm looking for the easiest way how to sum column 'num' based on conditions of type == 'A' and year < 1999
I'm iterating through the dataframe df with the data:
data = pd.read_csv('data.csv')
df = pd.DataFrame(data)
df_sum = pd.DataFrame
for index, row in df.iterrows():
if row['type'] == 'A' and row['year'] < 1999:
df_sum = df_sum.append(row) //This doesn't work
and trying to store the rows that match the conditions into df_sum where I'd make the sumarized num by id. Have no idea how to iterate and store the data based on condition into new dataframe.
The desired output would be:
id num_sum
1 0
2 2333
6 333
.....
CodePudding user response:
Let's use where
to select the rows in num
where type
equals A
and year is less than 1999
then group the selected rows by id
and agg with sum
m = df['type'].eq('A') & df['year'].lt(1999)
df['num'].where(m).groupby(df['id']).sum()
id
1 0.0
2 2333.0
3 1234.0
4 0.0
6 333.0
7 0.0
9 0.0
Name: num, dtype: float64
CodePudding user response:
You can use df.query()
to accomplish that.
filtered_df = df.query('type == "A" and year < 1999')
sum_df = filtered_df.groupby("id")["sum_num"].sum().reset_index()
print(sum_df)
Output:
id sum_num
0 1 0
1 2 2333
2 3 1234
3 6 333