I have the follwing df
:
df = pd.DataFrame({
'col1': [1, np.nan, np.nan, np.nan, 1, np.nan, np.nan, np.nan],
'col2': [np.nan, 2, np.nan, np.nan, np.nan, 2, np.nan, np.nan],
'col3': [np.nan, np.nan, 3, np.nan, np.nan, np.nan, 3, np.nan],
'col4': [np.nan, np.nan, np.nan, 4, np.nan, np.nan, np.nan, 4]
})
It has the following display:
col1 col2 col3 col4
0 1.0 NaN NaN NaN
1 NaN 2.0 NaN NaN
2 NaN NaN 3.0 NaN
3 NaN NaN NaN 4.0
4 5.0 NaN NaN NaN
5 NaN 6.0 NaN NaN
6 NaN NaN 7.0 NaN
7 NaN NaN NaN 8.0
My goal is to keep all rows begining with float
(not NaN value) and join to them the remaining ones.
The new_df
I want to get is:
col1 col2 col3 col4
0 1 2 3 4
4 5 6 7 8
Any help form your side will be highly appreciated (I upvote all answers).
Thank you!
CodePudding user response:
If need join first values per groups defined by non missing values in df['col1']
use:
df = (df.reset_index()
.groupby(df['col1'].notna().cumsum())
.first()
.set_index('index'))
CodePudding user response:
Try this:
df.apply(lambda x: x.dropna().to_numpy())
Output:
col1 col2 col3 col4
0 1.0 2.0 3.0 4.0
1 5.0 6.0 7.0 8.0
You can also, cast as integers:
df.apply(lambda x: x.dropna().to_numpy(dtype='int'))
Output:
col1 col2 col3 col4
0 1 2 3 4
1 5 6 7 8
CodePudding user response:
df.bfill()[df['col1'].notna()]
result:
col1 col2 col3 col4
0 1.0 2.0 3.0 4.0
4 1.0 2.0 3.0 4.0
your df has 12341234. not 12345678
And, I don't know exact logic you want. If structure of dataset is different, you cant use my code