How to create condition on pandas rows-CodePudding

the data frame is:

Circuit          #case     Month
NY,N              202       Jan
NY,E              413       Jan
NY,S              304       Jan
NY,W              106       Jan
VT                15        Jan
DE                56        Jan
NY,N              222       Feb
NY,E              13        Feb
NY,S              04        Feb
NY,W              06        Feb

I aim to write a python code to detect the US circuits that belong to the same state and return the sum of the cases for that state per month. How to do that for each month?

Circuit         #case     state        #total case      Month
NY,N              202        NY            1025          Jan
NY,E              413
NY,S              304
NY,W              106
VT                15         VT            15            Jan
NY,N              222        NY            245           Feb
NY,E              13
NY,S              04
NY,W              06

My code so far:

df2[['state', 'code']] = df['Circuit'].str.split(',', expand=True)
df2['#Cases'] = df['#Cases']
df2['total case'] = df2.groupby('state')['#Cases'].transform('sum')

It returns the "#total case" for all Circuits and not for Circuits per month.

CodePudding user response：

You're quite close! You mainly just need to group by both "state" and "month", which can be done via a list: ['state', 'Month']

Note that I also had to change some of the column names in your code to match with the supplied sample data.

df2[["state", "code"]] = df["Circuit"].str.split(",", expand=True)
df2["#case"] = df["#case"]
df2["total case"] = df2.groupby(["state", "Month"])["#case"].transform("sum")

print(df2)
  Circuit  #case Month state  code  total case
0    NY,N    202   Jan    NY     N        1025
1    NY,E    413   Jan    NY     E        1025
2    NY,S    304   Jan    NY     S        1025
3    NY,W    106   Jan    NY     W        1025
4      VT     15   Jan    VT  None          15
5      DE     56   Jan    DE  None          56
6    NY,N    222   Feb    NY     N         245
7    NY,E     13   Feb    NY     E         245
8    NY,S      4   Feb    NY     S         245
9    NY,W      6   Feb    NY     W         245

As always, a refactorization to placate the method chaining addicts:

out = (
    df["Circuit"].str.split(",", expand=True)
    .set_axis(["state", "code"], axis="columns")
    .join(df)
    .assign(total_case=lambda d: 
        d.groupby(["state", "Month"])["#case"].transform("sum")
    )
)

print(out)
  state  code Circuit  #case Month  total_cases
0    NY     N    NY,N    202   Jan         1025
1    NY     E    NY,E    413   Jan         1025
2    NY     S    NY,S    304   Jan         1025
3    NY     W    NY,W    106   Jan         1025
4    VT  None      VT     15   Jan           15
5    DE  None      DE     56   Jan           56
6    NY     N    NY,N    222   Feb          245
7    NY     E    NY,E     13   Feb          245
8    NY     S    NY,S      4   Feb          245
9    NY     W    NY,W      6   Feb          245