Sum and skip rows pandas-CodePudding

I'm trying to sum specific rows and columns on pandas but I have some issues. Below is my sample code:

import pandas as pd
import numpy as np

df = pd.DataFrame({'STT':['1','2','2.1','2.1.1','2.2','2.3','2.4'],
                         'Number 1':[100,0,10,5,20,30,40],
                         'Number 2':[120,0,15,7,25,35,45]})
df.head(10)

Output as below:

STT Number 1    Number 2
0   1   100 120
1   2   0   0
2   2.1 10  15
3   2.1.1   5   7
4   2.2 20  25
5   2.3 30  35
6   2.4 40  45

I want to sum of 2 from 2.1, 2.2, 2.3, 2.4 and skip 2.1.1 I used below code to sum:

df.iloc[1,1] = df.iloc[2,1] ((df.iloc[4:7,1]).sum())
df.iloc[1,2] = df.iloc[2,2] ((df.iloc[4:7,2]).sum())
df.head()

Output as below:

    STT Number 1    Number 2
0   1   100 120
1   2   100 120
2   2.1 10  15
3   2.1.1   5   7
4   2.2 20  25
5   2.3 30  35
6   2.4 40  45

Actually it's output that I want, but I'm curious that is there anyway to make code shorter? If dataframe have 100 columns so I have to write 100 times of same code?

I tried this code: df.iloc[1,1:2] = df.iloc[2,1:2] ((df.iloc[4:7,1:2]).sum()) but it didn't work.

Thank you.

CodePudding user response：

IIUC, to have an programmatic way, you could do:

cols = ['Number 1', 'Number 2']
df.loc[df['STT'].eq('2'), cols] = df.loc[df['STT'].str.fullmatch(r'2\.\d'), cols].sum().values

output:

     STT  Number 1  Number 2
0      1     100.0     120.0
1      2     100.0     120.0
2    2.1      10.0      15.0
3  2.1.1       5.0       7.0
4    2.2      20.0      25.0
5    2.3      30.0      35.0
6    2.4      40.0      45.0

CodePudding user response：

You are close, only for select both columns need 1:3 instead 1:2:

df.iloc[1,1:3] = df.iloc[2,1:3]   (df.iloc[4:7,1:3]).sum()
print (df)
     STT  Number 1  Number 2
0      1       100       120
1      2       100       120
2    2.1        10        15
3  2.1.1         5         7
4    2.2        20        25
5    2.3        30        35
6    2.4        40        45

CodePudding user response：

I suggest you use regex as what follows:

df[df["STT"].str.contains("^2(\.\d)?$")]

Output

	STT	Number 1	Number 2
1	2	0	0
2	2.1	10	15
4	2.2	20	25
5	2.3	30	35
6	2.4	40	45

Then, if you are interested in summing over the Number1 and Number2 columns you can use:

df[df["STT"].str.contains("^2(\.\d)?$")][["Number 1", "Number 2"]].sum()

Output

	0
Number 1	100
Number 2	120