the data frame is:
Circuit #case Month
NY,N 202 Jan
NY,E 413 Jan
NY,S 304 Jan
NY,W 106 Jan
VT 15 Jan
DE 56 Jan
NY,N 222 Feb
NY,E 13 Feb
NY,S 04 Feb
NY,W 06 Feb
I aim to write a python code to detect the US circuits that belong to the same state and return the sum of the cases for that state per month. How to do that for each month?
Circuit #case state #total case Month
NY,N 202 NY 1025 Jan
NY,E 413
NY,S 304
NY,W 106
VT 15 VT 15 Jan
NY,N 222 NY 245 Feb
NY,E 13
NY,S 04
NY,W 06
My code so far:
df2[['state', 'code']] = df['Circuit'].str.split(',', expand=True)
df2['#Cases'] = df['#Cases']
df2['total case'] = df2.groupby('state')['#Cases'].transform('sum')
It returns the "#total case" for all Circuits and not for Circuits per month.
CodePudding user response:
You're quite close! You mainly just need to group by both "state"
and "month"
, which can be done via a list: ['state', 'Month']
Note that I also had to change some of the column names in your code to match with the supplied sample data.
df2[["state", "code"]] = df["Circuit"].str.split(",", expand=True)
df2["#case"] = df["#case"]
df2["total case"] = df2.groupby(["state", "Month"])["#case"].transform("sum")
print(df2)
Circuit #case Month state code total case
0 NY,N 202 Jan NY N 1025
1 NY,E 413 Jan NY E 1025
2 NY,S 304 Jan NY S 1025
3 NY,W 106 Jan NY W 1025
4 VT 15 Jan VT None 15
5 DE 56 Jan DE None 56
6 NY,N 222 Feb NY N 245
7 NY,E 13 Feb NY E 245
8 NY,S 4 Feb NY S 245
9 NY,W 6 Feb NY W 245
As always, a refactorization to placate the method chaining addicts:
out = (
df["Circuit"].str.split(",", expand=True)
.set_axis(["state", "code"], axis="columns")
.join(df)
.assign(total_case=lambda d:
d.groupby(["state", "Month"])["#case"].transform("sum")
)
)
print(out)
state code Circuit #case Month total_cases
0 NY N NY,N 202 Jan 1025
1 NY E NY,E 413 Jan 1025
2 NY S NY,S 304 Jan 1025
3 NY W NY,W 106 Jan 1025
4 VT None VT 15 Jan 15
5 DE None DE 56 Jan 56
6 NY N NY,N 222 Feb 245
7 NY E NY,E 13 Feb 245
8 NY S NY,S 4 Feb 245
9 NY W NY,W 6 Feb 245