pd.DataFrame([["Stress", "NaN"], ["NaN", "Pregnancy"], ["Alcohol", "Pregnancy"]], columns=['causes', 'causes.2'])
I have a sample dataset here, technically, these columns should have been merged to one but for some reason, they weren't. now, I am tasked to make a pie chart and I do know how to do that with one column hence I want to merge these columns into a single column with a distinct name.
I tried using df.stack().reset_index()
but that gives me a weird object I do not know how to manipulate:
level_0 level_1 0
0 0 causes Stress
1 0 causes.2 NaN
2 1 causes NaN
3 1 causes.2 Pregnancy
4 2 causes Alcohol
5 2 causes.2 Pregnancy
Anyone know how I could achieve this?
I plan on using for the pie chart:
values = df["Cause of...."].value_counts()
ax = values.plot(kind="pie", autopct='%1.1f%%', shadow=True, legend=True, title="", ylabel='', labeldistance=None)
ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left')
plt.show()
CodePudding user response:
You can flatten using the underlying numpy array and create a new Series:
pd.Series(df.to_numpy().ravel(), name='causes')
Output:
0 Stress
1 NaN
2 NaN
3 Pregnancy
4 Alcohol
5 Pregnancy
Name: causes, dtype: object
If you have many columns, you need to select only the ones you want to flatten, for example selecting by name:
pd.Series(df.filter(like='causes').to_numpy().ravel(), name='causes')