I have created a dataframe with dates and the amount of days each month from a dataframe of arrival and departure dates. This dataframe can be n columns long.
2021-01 2021-02 2021-03
0 0 0 4
1 0 2 1
2 0 5 0
3 0 4 4
4 10 2 0
What I want to do is: Select the last column with a value that is nonzero and subtract 1.
Basically looking like this afterwards:
2021-01 2021-02 2021-03
0 0 0 3
1 0 2 0
2 0 4 0
3 0 4 3
4 10 1 0
I have tried using numpy with:
df2 = df2[np.arange(df2.shape[0]),(df2!=0).cumsum(1).argmax(1)] - 1
But this does not seem to work correctly with pandas.
CodePudding user response:
Do you mean always select the last column or the last column that has at least one nonzero value?
Edit: Alright, here goes a possible solution. I did this on a numpy array because doing computation on dataframes can be problematic.
import numpy as np
import pandas as pd
list = [[0,0,4],[0,2,1],[0,5,0],[0,4,4],[10, 2, 0]]
nplist = np.array(list)
lastcol = np.where(nplist[:,-1]==0, nplist[:,-1], nplist[:,-1] - 1)
nplist[:,-1] = lastcol
df = pd.Dataframe(nplist, index=['2021-01', '2021-02', '2021-03'])
CodePudding user response:
Get rows with zero then filter and keep only the one with lowest index
has_zero = df.eq(0, axis=0)
all_zero_rows = has_zero.dot(df.columns.str.split())
col_to_replace = all_zero_rows.explode().drop_duplicates(keep="last").sort_index()[0]
df.loc[df.index[-1], col_to_replace] = df.loc[df.index[-1], col_to_replace] -1
print(df)
2021-01 2021-02 2021-03
0 0 0 3
1 0 2 0
2 0 4 0
3 0 4 3
4 10 1 0