the data frame is:
US Circuit #case
NY,N 202
NY,E 413
NY,S 304
NY,W 106
VT 15
DE 56
NJ 682
PA,E 147
PA,M 132
PA,W 209
VI 0
MD 453
NC,E 84
NC,M 60
NC,W 58
I aim to write a python code to detect the US circuits that belong to the same state and return the sum of the cases for that state.
US Circuit #case state #total case
NY,N 202 NY 1025
NY,E 413
NY,S 304
NY,W 106
VT 15 VT 15
DE 56 DE 56
NJ 682 NJ 682
PA,E 147 PA 488
PA,M 132
PA,W 209
VI 0 VI 0
CodePudding user response:
To get the sums, you can split on ","
:
In [4]: df.groupby(df["Circuit"].str.split(",").str[0]).sum().reset_index()
Out[4]:
Circuit #case
0 DE 56
1 MD 453
2 NC 202
3 NJ 682
4 NY 1025
5 PA 488
6 VI 0
7 VT 15
CodePudding user response:
If you don't need the empty rows you can use groupby transform
df[['state', 'code']] = df['US Circuit'].str.split(',', expand=True)
df['total case'] = df.groupby('state')['#case'].transform('sum')
US Circuit #case state code total case
0 NY,N 202 NY N 1025
1 NY,E 413 NY E 1025
2 NY,S 304 NY S 1025
3 NY,W 106 NY W 1025
4 VT 15 VT None 15
5 DE 56 DE None 56
6 NJ 682 NJ None 682
7 PA,E 147 PA E 488
8 PA,M 132 PA M 488
9 PA,W 209 PA W 488
10 VI 0 VI None 0
11 MD 453 MD None 453
12 NC,E 84 NC E 202
13 NC,M 60 NC M 202
14 NC,W 58 NC W 202