Python solution present column values to list-CodePudding

I have data something as below:

df = pd.DataFrame({'column1': ['Y', 'Y', 'Y'],
                   'value_5': ['N', 'Y', 'Y'],
                   'value_6': ['N', 'Y', 'N'],
                   'value_10': ['Y', 'N', 'N'],
                   'value_20': ['N', 'N', 'Y']},
                  index=['key1','key2','key4'])
print(df)
     column1 value_5 value_6 value_10 value_20
key1       Y       N       N        Y        N
key2       Y       Y       Y        N        N
key4       Y       Y       N        N        Y

From that data I would like to create the last column. But the number of columns and values may be different in each run.

CodePudding user response：

First extract integers from columns with substrings value_ selected by DataFrame.filter, then compare values by Y and if match convert columns names to lists:

f = lambda x: int(x.split('_')[-1])
df1 = df.filter(like='value_').rename(columns=f)

df['new'] = df1.eq('Y').agg(lambda x: x.index[x].tolist(), axis=1)

Another idea with regex for integers from columns names and for list is used list comprehension:

import re
f = lambda x: next(map(int,re.findall(r'\d ',x)))
df1 = df.filter(like='value_').rename(columns=f)

df['new'] = [df1.columns[x].tolist() for x in df1.eq('Y').to_numpy()]

Or use Series.str.extract:

df1 = df.filter(like='value_')
df1.columns = df1.columns.str.extract('(\d )', expand=False).astype(int)

df['new'] = [df1.columns[x].tolist() for x in df1.eq('Y').to_numpy()]
print (df)
     column1 value_5 value_6 value_10 value_20      new
key1       Y       N       N        Y        N     [10]
key2       Y       Y       Y        N        N   [5, 6]
key4       Y       Y       N        N        Y  [5, 20]

CodePudding user response：

Assuming df is this:

import pandas as pd

df = pd.DataFrame({'column1': ['Y', 'Y', 'Y'],
                   'value_5': ['N', 'Y', 'Y'],
                   'value_6': ['N', 'Y', 'N'],
                   'value_10': ['Y', 'N', 'N'],
                   'value_20': ['N', 'N', 'Y']  })
print(df)
  column1 value_5 value_6 value_10 value_20
0       Y       N       N        Y        N
1       Y       Y       Y        N        N
2       Y       Y       N        N        Y

I have extracted the column index where Y is present

expected = []
df1 = df[df.columns[['value' in c for c in df.columns]]]
for i in range(len(df1)):
    idx = df1.iloc[i, :][df1.iloc[i, :]=='Y'].index
    expected.append([int(e.split('_')[-1]) for e in idx])
df['Expected'] = expected
print(df)

  column1 value_5 value_6 value_10 value_20 Expected
0       Y       N       N        Y        N     [10]
1       Y       Y       Y        N        N   [5, 6]
2       Y       Y       N        N        Y  [5, 20]