Home > Software design >  Create a new row in a dataframe based only for non NaN values from a column
Create a new row in a dataframe based only for non NaN values from a column

Time:12-16

Lets say i have i dataframe like this one:

      col1       col2         col3
0     data1   Completed       Fail
1     data2   Completed       NaN
2     data3   Completed    Completed
3     data4   Completed       NaN
4     data5      NaN          NaN

How can i add an extra row for each time the value in col3 is not NaN and have a dataframe like this:

      col1     status           
0     data1   Completed 
1     data1      Fail
2     data2   Completed     
3     data3   Completed    
4     data3   Completed
5     data4   Completed      
6     data5      NaN        

I tried this but im not getting the desirable output:

df  = df.melt(id_vars=['col1'],  
        value_name="status")

CodePudding user response:

IIUC, you can first start by using pd.melt() as you already did but also drop all the null values by chaining dropna(). This will get you close, but not exactly where you want to be:

new = df.melt(id_vars='col1',value_name='status').sort_values(by='col1').dropna().drop('variable',axis=1)

>>> print(new)

    col1     status
0  data1  Completed
5  data1       Fail
1  data2  Completed
2  data3  Completed
7  data3  Completed
3  data4  Completed

At this point, you will need to bring over the rows from your original df that were nan in col2. You can do that using isnull() and pd.concat() respectively:

col2_nan = df.loc[df.col2.isnull()].drop('col3',axis=1).rename(columns = {'col2':'status'})

>>> print(pd.concat([new,col2_nan]).reset_index(drop=True))


    col1     status
0  data1  Completed
1  data1       Fail
2  data2  Completed
3  data3  Completed
4  data3  Completed
5  data4  Completed
6  data5        NaN
  • Related