Home > Net >  sumif and countif on Python for multiple columns , On row level and not column level
sumif and countif on Python for multiple columns , On row level and not column level

Time:10-09

I'm trying to figure a way to do:
COUNTIF(Col2,Col4,Col6,Col8,Col10,Col12,Col14,Col16,Col18,">=0.05")
SUMIF(Col2,Col4,Col6,Col8,Col10,Col12,Col14,Col16,Col18,">=0.05")

My attempt:

import pandas as pd
df=pd.read_excel(r'C:\\Users\\Downloads\\Prepped.xls') #Please use: https://github.com/BeboGhattas/temp-repo/blob/main/Prepped.xls
df.iloc[:, [2,4,6,8,10,12,14,16,18]].astype(float) #changing dtype to float
#unconditional sum
df['sum']=df.iloc[:, [2,4,6,8,10,12,14,16,18]].astype(float).sum(axis=1)

whatever goes below won't work

#sum if
df['greater-than-0.05']=df.iloc[:, [2,4,6,8,10,12,14,16,18]].astype(float).sum([c for c in col if c >= 0.05])
|    |   # | word    |   B64684807 | B64684807Measure   |   B649845471 | B649845471Measure        |   B83344143 | B83344143Measure         |   B67400624 | B67400624Measure   |   B85229235 | B85229235Measure   |   B85630406 | B85630406Measure   |   B82615898 | B82615898Measure   |   B87558236 |   B87558236Measure |   B00000009 | B00000009Measure   |   有效竞品数 | 关键词抓取时间      |   搜索量排名 |   月搜索量 |   在售商品数 |   竞争度 |
|---:|----:|:--------|------------:|:-------------------|-------------:|:-------------------------|------------:|:-------------------------|------------:|:-------------------|------------:|:-------------------|------------:|:-------------------|------------:|:-------------------|------------:|-------------------:|------------:|:-------------------|-------------:|:--------------------|-------------:|-----------:|-------------:|---------:|
|  0 |   1 | word 1  |    0.055639 | [主要流量词]       |     0.049416 | nan                      |    0.072298 | [精准流量词, 主要流量词] |    0.00211  | nan                |    0.004251 | nan                |    0.007254 | nan                |    0.074409 | [主要流量词]       |    0.033597 |                nan |    0.000892 | nan                |            9 | 2022-10-06 00:53:56 |         5726 |     326188 |         3810 |     0.01 |
|  1 |   2 | word 2  |    0.045098 | nan                |     0.005472 | nan                      |    0.010791 | nan                      |    0.072859 | [主要流量词]       |    0.003423 | nan                |    0.012464 | nan                |    0.027396 | nan                |    0.002825 |                nan |    0.060989 | [主要流量词]       |            9 | 2022-10-07 01:16:21 |         9280 |     213477 |        40187 |     0.19 |
|  2 |   3 | word 3  |    0.02186  | nan                |     0.05039  | [主要流量词]             |    0.007842 | nan                      |    0.028832 | nan                |    0.044385 | [精准流量词]       |    0.001135 | nan                |    0.003866 | nan                |    0.021035 |                nan |    0.017202 | nan                |            9 | 2022-10-07 00:28:31 |        24024 |      81991 |         2275 |     0.03 |
|  3 |   4 | word 4  |    0.000699 | nan                |     0.01038  | nan                      |    0.001536 | nan                      |    0.021512 | nan                |    0.007658 | nan                |    5e-05    | nan                |    0.048682 | nan                |    0.001524 |                nan |    0.000118 | nan                |            9 | 2022-10-07 00:52:12 |        34975 |      53291 |        30970 |     0.58 |
|  4 |   5 | word 5  |    0.00984  | nan                |     0.030248 | nan                      |    0.003006 | nan                      |    0.014027 | nan                |    0.00904  | [精准流量词]       |    0.000348 | nan                |    0.000414 | nan                |    0.006721 |                nan |    0.00153  | nan                |            9 | 2022-10-07 02:36:05 |        43075 |      41336 |         2230 |     0.05 |
|  5 |   6 | word 6  |    0.010029 | [精准流量词]       |     0.120739 | [精准流量词, 主要流量词] |    0.014359 | nan                      |    0.002796 | nan                |    0.002883 | nan                |    0.028747 | [精准流量词]       |    0.007022 | nan                |    0.017803 |                nan |    0.001998 | nan                |            9 | 2022-10-07 00:44:51 |        49361 |      34791 |          517 |     0.01 |
|  6 |   7 | word 7  |    0.002735 | nan                |     0.002005 | nan                      |    0.005355 | nan                      |    6.3e-05  | nan                |    0.000772 | nan                |    0.000237 | nan                |    0.015149 | nan                |    2.1e-05  |                nan |    2.3e-05  | nan                |            9 | 2022-10-07 09:48:20 |        53703 |      31188 |          511 |     0.02 |
|  7 |   8 | word 8  |    0.003286 | [精准流量词]       |     0.058161 | [主要流量词]             |    0.013681 | [精准流量词]             |    0.000748 | [精准流量词]       |    0.002684 | [精准流量词]       |    0.013916 | [精准流量词]       |    0.029376 | nan                |    0.019792 |                nan |    0.005602 | nan                |            9 | 2022-10-06 01:51:53 |        58664 |      27751 |          625 |     0.02 |
|  8 |   9 | word 9  |    0.004273 | [精准流量词]       |     0.025581 | [精准流量词]             |    0.014784 | [精准流量词]             |    0.00321  | [精准流量词]       |    0.000892 | nan                |    0.00223  | nan                |    0.005315 | nan                |    0.02211  |                nan |    0.027008 | [精准流量词]       |            9 | 2022-10-07 01:34:28 |        73640 |      20326 |          279 |     0.01 |
|  9 |  10 | word 10 |    0.002341 | [精准流量词]       |     0.029604 | nan                      |    0.007817 | [精准流量词]             |    0.000515 | [精准流量词]       |    0.001865 | [精准流量词]       |    0.010128 | [精准流量词]       |    0.015378 | nan                |    0.019677 |                nan |    0.003673 | nan                |            9 | 2022-10-07 01:17:44 |        80919 |      17779 |          207 |     0.01 |

So my question is, How can i do the sumif and countif on the exact table (Should use col2,col4... etc, because every file will have the same format but different header, so using df['B64684807'] isn't helpful )

Sample file can be found at: https://github.com/BeboGhattas/temp-repo/blob/main/Prepped.xls

CodePudding user response:

IIUC, you can use a boolean mask:

df2 = df.iloc[:, [2,4,6,8,10,12,14,16,18]].astype(float)
m = df2.ge(0.05)
df['countif'] = m.sum(axis=1)
df['sumif'] = df2.where(m).sum(axis=1)

output (last 3 columns only):

        sum  countif     sumif
0  0.299866        3  0.202346
1  0.241317        2  0.133848
2  0.196547        1  0.050390
3  0.092159        0  0.000000
4  0.075174        0  0.000000
5  0.206376        1  0.120739
6  0.026360        0  0.000000
7  0.147246        1  0.058161
8  0.105403        0  0.000000
9  0.090998        0  0.000000
  • Related