how I can count signs columns?-CodePudding

How I can count the number of repetitive positive or negative elements in each row?

Suppose I have the following data:

ski      2020    2021      2022     2023       2024      2025
book      1.2     5.6       8.4      -2         -5         6
jar       4.2      -5        -8      2          4           6
kook       -4      -5.2      -2.3    -5.6        -7        8

The output is a list for each row that counts the number of similar signs. For example in the first row we have 3 positive elements and then 2 negative and again one positive. So the output is [3,-2,1]. and for 2 other rows the output is as follows:

 jar   [1,-2,3]
 kook   [-5,1]

CodePudding user response：

Let us try:

s = np.sign(df.set_index('ski').stack())
s.groupby([pd.Grouper(level=0), s.diff().ne(0).cumsum()]).sum().groupby(level=0).agg(list)

ski
book    [3.0, -2.0, 1.0]
jar     [1.0, -2.0, 3.0]
kook         [-5.0, 1.0]
dtype: object

CodePudding user response：

Use DataFrame.clip with custom lambda function for count consecutive values:

#if necessary
df = df.set_index('ski')
print (df)
      2020  2021  2022  2023  2024  2025
ski                                     
book   1.2   5.6   8.4  -2.0    -5     6
jar    4.2  -5.0  -8.0   2.0     4     6
kook  -4.0  -5.2  -2.3  -5.6    -7     8


from  itertools import groupby

f = lambda x: [ int(sum(key for _ in group)) for key, group in groupby( x )]
s = df.clip(upper=1, lower=-1).apply(f, 1)
print (s)
ski
book    [3, -2, 1]
jar     [1, -2, 3]
kook       [-5, 1]
dtype: object

CodePudding user response：

Use:

import pandas as pd
import numpy as np
data = '''ski      2020    2021      2022     2023       2024      2025
book      1.2     5.6       8.4      -2         -5         6
jar       4.2      -5        -8      2          4           6
kook       -4      -5.2      -2.3    -5.6        -7        8'''
data = np.array([x.split() for x in data.split('\n')])

import seaborn as sns

df = pd.DataFrame(data[1:,1:], columns = data[0,1:], index = data[1:,0])


output = []
import math
for i, row in df.iterrows():
    out = []
    c=0
    prev = math.copysign(1,float(row[0]))
    temp = row.append(pd.Series(-math.copysign(1,float(row[-1]))))
    for cell in temp:
        
        currrent_sign = math.copysign(1,float(cell))
        #print(prev, currrent_sign, c)
        if currrent_sign==prev:
            c =currrent_sign
        else:
            prev = currrent_sign
            out.append(c)
            c=currrent_sign
    output.append(out)

Output:

[[3.0, -2.0, 1.0], [1.0, -2.0, 3.0], [-5.0, 1.0]]

CodePudding user response：

I assume that ski is the index column. If not, set it as the index, dropping the current one.

Start from defining a function, to be applied to each row:

def myCounts(row):
    sgn = row.ge(0)
    return sgn.groupby(sgn.ne(sgn.shift()).cumsum()).apply(
        lambda grp: grp.count() * (1 if grp.iloc[0] else -1)).tolist()

Then apply it:

result = df.apply(myCounts, axis=1)

For your source data, I got:

ski
book    [3, -2, 1]
jar     [1, -2, 3]
kook       [-5, 1]
dtype: object

My solution is significantly shorter than the other.