I have the following DataFrame:
A B C D
0 0.0 0.0 0.0 0.0
1 1.0 1.0 1.0 1.0
2 NaN 2.0 2.0 2.0
3 NaN 3.0 3.0 3.0
4 NaN 4.0 4.0 NaN
5 NaN NaN 5.0 NaN
6 NaN NaN 6.0 NaN
I am working to generate visualizations with this data, and I need to fill the null values in a very specific way. I want to loop the existing values repeatedly for each column until the null values are all filled, so that the DataFrame looks like this:
A B C D
0 0.0 0.0 0.0 0.0
1 1.0 1.0 1.0 1.0
2 0.0 2.0 2.0 2.0
3 1.0 3.0 3.0 3.0
4 0.0 4.0 4.0 0.0
5 1.0 0.0 5.0 1.0
6 0.0 1.0 6.0 2.0
Is there any convenient way to do this in Pandas?
CodePudding user response:
You can apply
a custom function on each column that obtains the values to be iterated and then extends them to the full length of the dataframe. This can be done using np.resize
as follows:
def f(x):
vals = x[~x.isnull()].values
vals = np.resize(vals, len(x))
return vals
df = df.apply(f, axis=0)
Result:
A B C D
0 0.0 0.0 0.0 0.0
1 1.0 1.0 1.0 1.0
2 0.0 2.0 2.0 2.0
3 1.0 3.0 3.0 3.0
4 0.0 4.0 4.0 0.0
5 1.0 0.0 5.0 1.0
6 0.0 1.0 6.0 2.0
CodePudding user response:
One option is with a for loop; the assumption is that the NaNs are at the end of each column, if any. Use np.place
to fill the nulls :
[np.place(df[col].to_numpy(),
df[col].isna(),
df[col].dropna().array)
for col in df
if df[col].hasnans]
[None, None, None]
df
A B C D
0 0.0 0.0 0.0 0.0
1 1.0 1.0 1.0 1.0
2 0.0 2.0 2.0 2.0
3 1.0 3.0 3.0 3.0
4 0.0 4.0 4.0 0.0
5 1.0 0.0 5.0 1.0
6 0.0 1.0 6.0 2.0
Note that np.place
is an in place operation, no assignment is needed.