Home > Software engineering >  How to format a dataframe having many NaN values, join all rows to those not starting with NaN
How to format a dataframe having many NaN values, join all rows to those not starting with NaN

Time:12-14

I have the follwing df:

df = pd.DataFrame({
    'col1': [1, np.nan, np.nan, np.nan, 1, np.nan, np.nan, np.nan],
    'col2': [np.nan, 2, np.nan, np.nan, np.nan, 2, np.nan, np.nan],
    'col3': [np.nan, np.nan, 3, np.nan, np.nan, np.nan, 3, np.nan],
    'col4': [np.nan, np.nan, np.nan, 4, np.nan, np.nan, np.nan, 4]
    })

It has the following display:

    col1 col2 col3 col4
0   1.0 NaN NaN NaN
1   NaN 2.0 NaN NaN
2   NaN NaN 3.0 NaN
3   NaN NaN NaN 4.0
4   5.0 NaN NaN NaN
5   NaN 6.0 NaN NaN
6   NaN NaN 7.0 NaN
7   NaN NaN NaN 8.0

My goal is to keep all rows begining with float (not NaN value) and join to them the remaining ones.

The new_df I want to get is:

    col1 col2 col3 col4
0   1   2   3   4
4   5   6   7   8

Any help form your side will be highly appreciated (I upvote all answers).

Thank you!

CodePudding user response:

If need join first values per groups defined by non missing values in df['col1'] use:

df = (df.reset_index()
        .groupby(df['col1'].notna().cumsum())
        .first()
        .set_index('index'))

CodePudding user response:

Try this:

df.apply(lambda x: x.dropna().to_numpy())

Output:

   col1  col2  col3  col4
0   1.0   2.0   3.0   4.0
1   5.0   6.0   7.0   8.0

You can also, cast as integers:

df.apply(lambda x: x.dropna().to_numpy(dtype='int'))

Output:

   col1  col2  col3  col4
0     1     2     3     4
1     5     6     7     8

CodePudding user response:

df.bfill()[df['col1'].notna()]

result:

    col1    col2    col3    col4
0   1.0     2.0     3.0     4.0
4   1.0     2.0     3.0     4.0

your df has 12341234. not 12345678

And, I don't know exact logic you want. If structure of dataset is different, you cant use my code

  • Related