I have a csv file with some messy data.
I have following dataframe in pandas
Name | Age | Sex | Salary | Status |
---|---|---|---|---|
John | 32 | Nan | NaN | NaN |
Nan | Male | 4000 | Single | NaN |
May | 20 | Female | 5000 | Married |
teresa | 45 |
Desired output:
Name Age Sex Salary Status
0 John 32 Male 4000 Single
1 May 20 Female 5000 Married
2 teresa 45
So Does anyone know how to do it with Pandas?
CodePudding user response:
You can use a bit of numpy magic to drop the NaNs and reshape the underlying array:
a = df.replace({'Nan': float('nan')}).values.flatten()
pd.DataFrame(a[~pd.isna(a)].reshape(-1, len(df.columns)),
columns=df.columns)
output:
Name Age Sex Salary Status
0 John 32 Male 4000 Single
1 May 20 Female 5000 Married
CodePudding user response:
Try groupby
:
>>> df.groupby(df['Name'].notna().cumsum()).apply(lambda x: x.apply(lambda x: next(iter(x.dropna()), np.nan))).reset_index(drop=True)
Name Age Sex Salary Status
0 John 32 4000 Single NaN
1 May 20 Female 5000 Married
>>>