One common thing people seem to want to do in pandas
is to replace None
-values with the next or previous None
-value. This is easily done with .fillna
. I however want to do something similar but different.
I have a dataframe, df
, with some entries. Every row has a different number of entries and they are all "left-adjusted" (if the df
is 10 columns wide and some row has n<10
entries the first n
columns hold the entries and the remaining columns are None
s).
What I want to do is find the last non-None
entry in every row and change it to also be a None
. This could be any of the columns from the first to the last.
I could of course do this with a for-loop
but my df
s can be quite large so something quicker would be preferable. Any ideas?
Thanks!
CodePudding user response:
With help from numpy, this is quite easy. By counting the number of None
in each row one can find for each row the column with the last non-None value. Then using Numpy change this value to None
:
data = np.random.random((6,10))
df = pd.DataFrame(data)
df.iloc[0, 7:] = None
df.iloc[1, 6:] = None
df.iloc[2, 5:] = None
df.iloc[3, 8:] = None
df.iloc[4, 5:] = None
df.iloc[5, 4:] = None
Original dataframe looks like this:
0 1 2 3 4 5
0 0.992337 0.651785 0.521422 NaN NaN NaN
1 0.912962 0.292458 0.620195 0.507071 0.010205 NaN
2 0.061320 0.565979 0.344755 NaN NaN NaN
3 0.521936 0.057917 0.359699 0.484009 NaN NaN
isnull = df.isnull()
col = data.shape[1] - isnull.sum(axis = 1) - 1
df.values[range(len(df)), col] = None
Updated dataframe looks like this:
0 1 2 3 4 5
0 0.992337 0.651785 NaN NaN NaN NaN
1 0.912962 0.292458 0.620195 0.507071 NaN NaN
2 0.061320 0.565979 NaN NaN NaN NaN
3 0.521936 0.057917 0.359699 NaN NaN NaN
CodePudding user response:
You can find the index of the element to replace in each row with np.argmax()
:
indices = np.isnan(df.to_numpy()).argmax(axis=1) - 1
df.to_numpy()[range(len(df)), indices] = None