Home > Mobile >  Pandas - Replace Last Non-None Value with None Row-wise
Pandas - Replace Last Non-None Value with None Row-wise

Time:08-30

One common thing people seem to want to do in pandas is to replace None-values with the next or previous None-value. This is easily done with .fillna. I however want to do something similar but different.

I have a dataframe, df, with some entries. Every row has a different number of entries and they are all "left-adjusted" (if the df is 10 columns wide and some row has n<10 entries the first n columns hold the entries and the remaining columns are Nones).

What I want to do is find the last non-None entry in every row and change it to also be a None. This could be any of the columns from the first to the last.

I could of course do this with a for-loop but my dfs can be quite large so something quicker would be preferable. Any ideas?

Thanks!

CodePudding user response:

With help from numpy, this is quite easy. By counting the number of None in each row one can find for each row the column with the last non-None value. Then using Numpy change this value to None:

data = np.random.random((6,10))

df = pd.DataFrame(data)
df.iloc[0, 7:] = None
df.iloc[1, 6:] = None
df.iloc[2, 5:] = None
df.iloc[3, 8:] = None
df.iloc[4, 5:] = None
df.iloc[5, 4:] = None

Original dataframe looks like this:

          0         1         2         3         4   5
0  0.992337  0.651785  0.521422       NaN       NaN NaN
1  0.912962  0.292458  0.620195  0.507071  0.010205 NaN
2  0.061320  0.565979  0.344755       NaN       NaN NaN
3  0.521936  0.057917  0.359699  0.484009       NaN NaN
isnull = df.isnull()

col = data.shape[1] - isnull.sum(axis = 1) - 1

df.values[range(len(df)), col] = None

Updated dataframe looks like this:

          0         1         2         3   4   5
0  0.992337  0.651785       NaN       NaN NaN NaN
1  0.912962  0.292458  0.620195  0.507071 NaN NaN
2  0.061320  0.565979       NaN       NaN NaN NaN
3  0.521936  0.057917  0.359699       NaN NaN NaN

CodePudding user response:

You can find the index of the element to replace in each row with np.argmax():

indices = np.isnan(df.to_numpy()).argmax(axis=1) - 1

df.to_numpy()[range(len(df)), indices] = None
  • Related