Example data:
| alcoholism | diabites | | handicapped | hypertensive | new col |
| -------- | -------- | | -------- | -------- | ---------------- |
| 1 | 0 | | 1 | 0 | alcoholism, handicapped |
| 0 | 1 | | 0 | 1 | diabites, hypertensive |
| 0 | 1 | | 0 | 0 | diabites |
If any of the above columns has value = 1, then I need the new column to have the names of these columns only, and if all are zero return no condition.
I had tried to do it with the below code:
problems = ['alcoholism', 'diabetes','hypertension','handicap']
m1 = df[problems].isin([1])
mask = m1 | (m1.loc[~m1.any(axis=1)])
df['sp_name'] = mask.mul(problems).apply(lambda x: [i for i in x if i], axis=1)
But it returns the data with brackets like [handicapped, alcoholism]
.
The issue is that I can't do value counts as the zero values show as empty []
and will not be plotted.
CodePudding user response:
I still don't understand your ultimate goal, or how this will be useful in plotting, but all you're really missing is using str.join
to combine each list into the string you want. That said, the way you've gotten there involves unnecessary steps. First, multiply the DataFrame by its own column names:
df * df.columns
alcoholism diabetes handicapped hypertension
0 alcoholism handicapped
1 diabetes hypertension
2 diabetes
Then you can apply
the same as you did:
(df * df.columns).apply(lambda row: [i for i in row if i], axis=1)
0 [alcoholism, handicapped]
1 [diabetes, hypertension]
2 [diabetes]
dtype: object
Then you just need to include a string join
in the function you supply to apply
. Here's a complete example:
import pandas as pd
df = pd.DataFrame({
'alcoholism': [1, 0, 0],
'diabetes': [0, 1, 1],
'handicapped': [1, 0, 0],
'hypertension': [0, 1, 0],
})
df['new_col'] = (
(df * df.columns)
.apply(lambda row: ', '.join([i for i in row if i]), axis=1)
)
print(df)
alcoholism diabetes handicapped hypertension new_col
0 1 0 1 0 alcoholism, handicapped
1 0 1 0 1 diabetes, hypertension
2 0 1 0 0 diabetes
CodePudding user response:
df['new_col'] = df.iloc[:, :-1].dot(df.add_suffix(",").columns[:-1]).str[:-1]
i already found this solution helpful for me