my step is like this, I will filter the dataframe first, then groupby to get the dataset I want. For example, my df is like this:
test = pd.DataFrame([["Mon", 3, "A"], ["Tue", 6, "A"], ["Tue", 4, "A"], ["Wed", pd.np.nan, "A"], ["Thu", 9, "A"], ["Fri", 1, "A"], ["Sat", 2, "A"], ["Sun", 3, "B"]], columns=["date", "value", "class"])
The dataframe "test" looks like this:
date value Class
0 Mon 3 A
1 Tue 6 A
2 Tue 4 A
3 Wed 0 A
4 Thu 9 A
5 Fri 1 A
6 Sat 2 A
7 Sun 3 B
I will first filter all A by test = test[test["class"] == "A"]
Then I will aggregate by new_df = test(["date"],dropna=False)["value"].sum()
Because as you can see, Sunday will be filtered out in the first step. The result dataframe after the aggregation above will have no Sunday in it. But the final data set needs to keep Sunday there.
My goal is like this after grouping by "date".
date value
0 Mon 3
1 Tue 10
2 Wed 0
3 Thu 9
4 Fri 1
5 Sat 2
6 Sun 0
This is only a simplified version. My actual situation is, the "data" column is really a date column in reality, like 1-Jan-2021. The data set is a few years long. But the filtering will filter some dates out in the first step (like the example above, Sunday is gone after filtering). However, I want all the dates in those years to be in my result dataframe. How can i achieve it?
CodePudding user response:
How about set values that you don't want to keep to be 0 first
test.loc[test['class']!='A', 'value'] = 0
test.groupby('date')['value'].sum()
date
Fri 1.0
Mon 3.0
Sat 2.0
Sun 0.0
Thu 9.0
Tue 10.0
Wed 0.0