How to assign conditional value if I want to use pct_change method on some negative values?-CodePudding

I have a dataframe which contains some negative and positive values

I've used following code to get pct_change on row values

df_gp1 = df_gp1.pct_change(periods=4, axis=1) * 100

and here I want to assign some specific number, depending on how the values change from negative to positive or vice versa

for example, if the value turns from positive to negative, return -100

negative to positive, return 100

negative to negative, return -100,

positive to positive, ordinary pct_change

for example my current dataframe could look like the following

DATA	D-4	D-3	D-2	D-1	D-0
A	-20	-15	-13	-10	-5
B	-30	-15	-10	10	25
C	40	25	30	41	30
D	25	25	10	15	-10

I want a new output(dataframe) that gives me following return

DATA	D-0
A	-100
B	100
C	-25
D	-100

as you can see, the 4th period must provide pct_change (i.e D-0 / D-4), but if it stays negative, return -100 if it turns from positive to negative, still return -100 if it turns from negative to positive, return 100, if it's a change from positive value to another positive value, then apply pct_chg

and my original dataframe is like 4000 rows and 300 columns big. Thus my desired output will have 4000 rows and 296 columns(since the it eliminates data D-4, D-3, D-2, D-1

I tried to make conditional list, and choice list, and use np.select method, but I just don't know how to apply it across whole dataframe and create new one that returns percentage changes.

Any help is deeply appreciated.

CodePudding user response：

Use:

#convert column DATA to index if necessary
df = df.set_index('DATA')
#compare for less like 0
m1 = df.lt(0)
#comapre shifted 4 columns less like 0
m2 = df.shift(4, axis=1).lt(0)

#pass to np.select
arr = np.select([m1, ~m1 & m2, ~m1 & ~m2],
                [-100, 100, df.pct_change(periods=4, axis=1) * 100])

#create DataFrame, remove first 4 columns
df = pd.DataFrame(arr, index=df.index, columns=df.columns).iloc[:, 4:].reset_index()
print (df)
  DATA    D-0
0    A -100.0
1    B  100.0
2    C  -25.0
3    D -100.0

CodePudding user response：

Given:

      D-4  D-3  D-2  D-1  D-0
DATA
A     -20  -15  -13  -10   -5
B     -30  -15  -10   10   25
C      40   25   30   41   30
D      25   25   10   15  -10

Doing:

def stuff(row):
    if row['D-0'] < 0:
            return -100
    elif row['D-4'] < 0:
            return 100
    else:
            return (row.pct_change(periods=4) * 100)['D-0']

print(df.apply(stuff, axis=1))

Output:

A   -100.0
B    100.0
C    -25.0
D   -100.0
dtype: float64