Scripting a simple counter-CodePudding

I wanted to create a simple script, which counts values in one column, that are higher in another column:

d = {'a': [1, 3], 'b': [0, 2]}
df = pd.DataFrame(data=d, index=[1, 2])

print(df)

   a  b
1  1  0
2  3  2

My function:


def diff(dataframe):
    a_counter=0
    b_counter=0
    
    for i in dataframe["a"]:
        for ii in dataframe["b"]:
            if i>ii:
                a_counter =1
            elif ii>i:
                b_counter =1

    return a_counter, b_counter

However

diff(df)

returns (3, 1), instead of (2,0). I know the problem is that every single value of one column gets compared to every value of the other column (e.g. 1 gets compared to 0 and 2 of column b). There probably is a special function for my problem, but can you help me fix my script?

CodePudding user response：

I would suggest adding some helper columns in an intuitive way to help compute the sum of each condition a > b and b > a

A working example based on your code :

import numpy as np
import pandas as pd

d = {'a': [1, 3], 'b': [0, 2]}
df = pd.DataFrame(data=d, index=[1, 2])

def diff(dataframe):
    dataframe['a>b'] = np.where(dataframe['a']>dataframe['b'], 1, 0)
    dataframe['b>a'] = np.where(dataframe['b']>dataframe['a'], 1, 0)
    return dataframe['a>b'].sum(), dataframe['b>a'].sum()

print(diff(df))

>>> (2, 0)

Basically what np.where() does, the way I used it, is that it produces 1 if the condition is met and 0 otherwise. You can then add those columns up using a simple sum() function applied on the desired columns.

CodePudding user response：

Update

Maybe you can use:

>>> df['a'].gt(df['b']).sum(), df['b'].gt(df['a']).sum()
(2, 0)

IIUC, to fix your code:

def diff(dataframe):
    a_counter=0
    b_counter=0
    
    for i in dataframe["a"]:
        for ii in dataframe["b"]:
            if i>ii:
                a_counter =1
            elif ii>i:
                b_counter =1

    # Subtract the minimum of counters
    m = min(a_counter, b_counter)   
    return a_counter-m, b_counter-m

Output:

>>> diff(df)
(2, 0)

CodePudding user response：

IIUC, you can use the sign of the difference and count the values:

d = {1: 'a', -1: 'b', 0: 'equal'}

(np.sign(df['a'].sub(df['b']))
   .map(d)
   .value_counts()
   .reindex(list(d.values()), fill_value=0)
)

output:

a        2
b        0
equal    0
dtype: int64