Lets say i have i dataframe like this one:
col1 col2 col3
0 data1 Completed Fail
1 data2 Completed NaN
2 data3 Completed Completed
3 data4 Completed NaN
4 data5 NaN NaN
How can i add an extra row for each time the value in col3 is not NaN and have a dataframe like this:
col1 status
0 data1 Completed
1 data1 Fail
2 data2 Completed
3 data3 Completed
4 data3 Completed
5 data4 Completed
6 data5 NaN
I tried this but im not getting the desirable output:
df = df.melt(id_vars=['col1'],
value_name="status")
CodePudding user response:
IIUC, you can first start by using pd.melt()
as you already did but also drop all the null values by chaining dropna()
. This will get you close, but not exactly where you want to be:
new = df.melt(id_vars='col1',value_name='status').sort_values(by='col1').dropna().drop('variable',axis=1)
>>> print(new)
col1 status
0 data1 Completed
5 data1 Fail
1 data2 Completed
2 data3 Completed
7 data3 Completed
3 data4 Completed
At this point, you will need to bring over the rows from your original df
that were nan
in col2. You can do that using isnull()
and pd.concat()
respectively:
col2_nan = df.loc[df.col2.isnull()].drop('col3',axis=1).rename(columns = {'col2':'status'})
>>> print(pd.concat([new,col2_nan]).reset_index(drop=True))
col1 status
0 data1 Completed
1 data1 Fail
2 data2 Completed
3 data3 Completed
4 data3 Completed
5 data4 Completed
6 data5 NaN