How to create a list according to columns values-CodePudding

I have this dataframe:

Text     feat1   feat2   feat3    feat4
string1    1       1       0        0
string2    0       0       0        1
string3    0       0       0        0

I want to create 2 other columns this way:

Text     feat1   feat2   feat3    feat4     all_feat            count_feat
string1    1       1       0        0       ["feat1","feat2"]       2
string2    0       0       0        1       ["feat4"]               1
string3    0       0       0        0       []                      0

What's the best approach to do it in Python?

The columns names can be any string.

CodePudding user response：

You can use:

df1 = (df.filter(like='feat').mul(df.columns[1:]).apply(lambda x: [i for i in x if i], axis=1)
         .to_frame('all_feat').assign(count=lambda x: x['all_feat'].str.len()))
df = pd.concat([df, df1], axis=1)
print(df)

# Output
      Text  feat1  feat2  feat3  feat4        all_feat  count
0  string1      1      1      0      0  [feat1, feat2]      2
1  string2      0      0      0      1         [feat4]      1
2  string3      0      0      0      0              []      0

CodePudding user response：

You can use a groupby:

df2 = df.filter(like='feat').melt(ignore_index=False)
g = df2.groupby(level=0)

df['all_feat'] = g.apply(lambda g: list(g.loc[g['value'].eq(1), 'variable']))
df['count_feat'] = g['value'].sum()

output:

      Text  feat1  feat2  feat3  feat4        all_feat  count_feat
0  string1      1      1      0      0  [feat1, feat2]           2
1  string2      0      0      0      1         [feat4]           1
2  string3      0      0      0      0              []           0

CodePudding user response：

Check out this link

df = pd.DataFrame(data) # if this was your dataframe
new_col = ["item1", "item2"] # new col

df["new_col"] = new_col # add the new col