How to concatenate columns' name in new column if value =1 otherwise return 0 (python)-CodePudding

Example data:

| alcoholism | diabites | | handicapped |  hypertensive |          new col        |
|  --------  | -------- | |  --------   |    --------   |     ----------------    |
|     1      |    0     | |     1       |       0       | alcoholism, handicapped |
|     0      |    1     | |     0       |       1       | diabites, hypertensive  |
|     0      |    1     | |     0       |       0       |          diabites       |

If any of the above columns has value = 1, then I need the new column to have the names of these columns only, and if all are zero return no condition.

I had tried to do it with the below code:

problems = ['alcoholism', 'diabetes','hypertension','handicap']

m1 = df[problems].isin([1]) 
mask = m1 | (m1.loc[~m1.any(axis=1)])

df['sp_name'] = mask.mul(problems).apply(lambda x: [i for i in x if i], axis=1)

But it returns the data with brackets like [handicapped, alcoholism]. The issue is that I can't do value counts as the zero values show as empty [] and will not be plotted.

CodePudding user response：

I still don't understand your ultimate goal, or how this will be useful in plotting, but all you're really missing is using str.join to combine each list into the string you want. That said, the way you've gotten there involves unnecessary steps. First, multiply the DataFrame by its own column names:

df * df.columns

   alcoholism  diabetes  handicapped  hypertension
0  alcoholism            handicapped              
1              diabetes               hypertension
2              diabetes

Then you can apply the same as you did:

(df * df.columns).apply(lambda row: [i for i in row if i], axis=1)

0    [alcoholism, handicapped]
1     [diabetes, hypertension]
2                   [diabetes]
dtype: object

Then you just need to include a string join in the function you supply to apply. Here's a complete example:

import pandas as pd

df = pd.DataFrame({
    'alcoholism': [1, 0, 0],
    'diabetes': [0, 1, 1],
    'handicapped': [1, 0, 0],
    'hypertension': [0, 1, 0],
})

df['new_col'] = (
    (df * df.columns)
    .apply(lambda row: ', '.join([i for i in row if i]), axis=1)
)

print(df)

   alcoholism  diabetes  handicapped  hypertension                  new_col
0           1         0            1             0  alcoholism, handicapped
1           0         1            0             1   diabetes, hypertension
2           0         1            0             0                 diabetes

CodePudding user response：

df['new_col'] = df.iloc[:, :-1].dot(df.add_suffix(",").columns[:-1]).str[:-1]

i already found this solution helpful for me