How to format a dataframe having many NaN values, join all rows to those not starting with NaN-CodePudding

I have the follwing df:

df = pd.DataFrame({
    'col1': [1, np.nan, np.nan, np.nan, 1, np.nan, np.nan, np.nan],
    'col2': [np.nan, 2, np.nan, np.nan, np.nan, 2, np.nan, np.nan],
    'col3': [np.nan, np.nan, 3, np.nan, np.nan, np.nan, 3, np.nan],
    'col4': [np.nan, np.nan, np.nan, 4, np.nan, np.nan, np.nan, 4]
    })

It has the following display:

    col1 col2 col3 col4
0   1.0 NaN NaN NaN
1   NaN 2.0 NaN NaN
2   NaN NaN 3.0 NaN
3   NaN NaN NaN 4.0
4   5.0 NaN NaN NaN
5   NaN 6.0 NaN NaN
6   NaN NaN 7.0 NaN
7   NaN NaN NaN 8.0

My goal is to keep all rows begining with float (not NaN value) and join to them the remaining ones.

The new_df I want to get is:

    col1 col2 col3 col4
0   1   2   3   4
4   5   6   7   8

Any help form your side will be highly appreciated (I upvote all answers).

Thank you!

CodePudding user response：

If need join first values per groups defined by non missing values in df['col1'] use:

df = (df.reset_index()
        .groupby(df['col1'].notna().cumsum())
        .first()
        .set_index('index'))

CodePudding user response：

Try this:

df.apply(lambda x: x.dropna().to_numpy())

Output:

   col1  col2  col3  col4
0   1.0   2.0   3.0   4.0
1   5.0   6.0   7.0   8.0

You can also, cast as integers:

df.apply(lambda x: x.dropna().to_numpy(dtype='int'))

Output:

   col1  col2  col3  col4
0     1     2     3     4
1     5     6     7     8

CodePudding user response：

df.bfill()[df['col1'].notna()]

result:

    col1    col2    col3    col4
0   1.0     2.0     3.0     4.0
4   1.0     2.0     3.0     4.0

your df has 12341234. not 12345678

And, I don't know exact logic you want. If structure of dataset is different, you cant use my code