Home > front end >  Subtract one from last nonzero value in row; multiple column
Subtract one from last nonzero value in row; multiple column

Time:12-19

I have created a dataframe with dates and the amount of days each month from a dataframe of arrival and departure dates. This dataframe can be n columns long.

    2021-01 2021-02 2021-03
0   0       0       4
1   0       2       1
2   0       5       0
3   0       4       4
4   10      2       0

What I want to do is: Select the last column with a value that is nonzero and subtract 1.

Basically looking like this afterwards:

    2021-01 2021-02 2021-03
0   0       0       3
1   0       2       0
2   0       4       0
3   0       4       3
4   10      1       0

I have tried using numpy with:

df2 = df2[np.arange(df2.shape[0]),(df2!=0).cumsum(1).argmax(1)] - 1

But this does not seem to work correctly with pandas.

CodePudding user response:

Do you mean always select the last column or the last column that has at least one nonzero value?

Edit: Alright, here goes a possible solution. I did this on a numpy array because doing computation on dataframes can be problematic.

import numpy as np
import pandas as pd

list = [[0,0,4],[0,2,1],[0,5,0],[0,4,4],[10, 2, 0]]
nplist = np.array(list)

lastcol = np.where(nplist[:,-1]==0, nplist[:,-1], nplist[:,-1] - 1)
nplist[:,-1] = lastcol

df = pd.Dataframe(nplist, index=['2021-01', '2021-02', '2021-03'])

CodePudding user response:

Get rows with zero then filter and keep only the one with lowest index

has_zero = df.eq(0, axis=0)
all_zero_rows = has_zero.dot(df.columns.str.split())
col_to_replace = all_zero_rows.explode().drop_duplicates(keep="last").sort_index()[0]
df.loc[df.index[-1], col_to_replace] = df.loc[df.index[-1], col_to_replace] -1

print(df)

    2021-01 2021-02 2021-03
0   0       0       3
1   0       2       0
2   0       4       0
3   0       4       3
4   10      1       0
  • Related