How I can count the number of repetitive positive or negative elements in each row?
Suppose I have the following data:
ski 2020 2021 2022 2023 2024 2025
book 1.2 5.6 8.4 -2 -5 6
jar 4.2 -5 -8 2 4 6
kook -4 -5.2 -2.3 -5.6 -7 8
The output is a list for each row that counts the number of similar signs. For example in the first row we have 3 positive elements and then 2 negative and again one positive. So the output is [3,-2,1]. and for 2 other rows the output is as follows:
jar [1,-2,3]
kook [-5,1]
CodePudding user response:
Let us try:
s = np.sign(df.set_index('ski').stack())
s.groupby([pd.Grouper(level=0), s.diff().ne(0).cumsum()]).sum().groupby(level=0).agg(list)
ski
book [3.0, -2.0, 1.0]
jar [1.0, -2.0, 3.0]
kook [-5.0, 1.0]
dtype: object
CodePudding user response:
Use DataFrame.clip
with custom lambda function for count consecutive values:
#if necessary
df = df.set_index('ski')
print (df)
2020 2021 2022 2023 2024 2025
ski
book 1.2 5.6 8.4 -2.0 -5 6
jar 4.2 -5.0 -8.0 2.0 4 6
kook -4.0 -5.2 -2.3 -5.6 -7 8
from itertools import groupby
f = lambda x: [ int(sum(key for _ in group)) for key, group in groupby( x )]
s = df.clip(upper=1, lower=-1).apply(f, 1)
print (s)
ski
book [3, -2, 1]
jar [1, -2, 3]
kook [-5, 1]
dtype: object
CodePudding user response:
Use:
import pandas as pd
import numpy as np
data = '''ski 2020 2021 2022 2023 2024 2025
book 1.2 5.6 8.4 -2 -5 6
jar 4.2 -5 -8 2 4 6
kook -4 -5.2 -2.3 -5.6 -7 8'''
data = np.array([x.split() for x in data.split('\n')])
import seaborn as sns
df = pd.DataFrame(data[1:,1:], columns = data[0,1:], index = data[1:,0])
output = []
import math
for i, row in df.iterrows():
out = []
c=0
prev = math.copysign(1,float(row[0]))
temp = row.append(pd.Series(-math.copysign(1,float(row[-1]))))
for cell in temp:
currrent_sign = math.copysign(1,float(cell))
#print(prev, currrent_sign, c)
if currrent_sign==prev:
c =currrent_sign
else:
prev = currrent_sign
out.append(c)
c=currrent_sign
output.append(out)
Output:
[[3.0, -2.0, 1.0], [1.0, -2.0, 3.0], [-5.0, 1.0]]
CodePudding user response:
I assume that ski is the index column. If not, set it as the index, dropping the current one.
Start from defining a function, to be applied to each row:
def myCounts(row):
sgn = row.ge(0)
return sgn.groupby(sgn.ne(sgn.shift()).cumsum()).apply(
lambda grp: grp.count() * (1 if grp.iloc[0] else -1)).tolist()
Then apply it:
result = df.apply(myCounts, axis=1)
For your source data, I got:
ski
book [3, -2, 1]
jar [1, -2, 3]
kook [-5, 1]
dtype: object
My solution is significantly shorter than the other.