I have a dataframe with x number of columns, for example-
|a |b |c |d |e |f |g |h |i |median|
|----|----|----|----|----|----|----|----|----|------|
|0.10|0.08|0.29|0.13|0.03|0.29|0.06|0.03|0.06|0.08 |
|0.82|0.18|0.14|0.12|0.08|0.12|0.08|0.06|0.10|0.12 |
|0.57|0.17|0.13|0.14|0.05|0.16|0.05|0.04|0.09|0.13 |
I want another column named "possible_labels" where I have a list of all possible column names in sorted way where the condition fulfils
dataframe[column_name] >= dataframe['median']
for example-
|a |b |c |d |e |f |g |h |i |median|possible_labels|
|----|----|----|----|----|----|----|----|----|------|-------------|
|0.10|0.08|0.29|0.13|0.03|0.29|0.06|0.03|0.06|0.08 |[c,f,d,a,b] |
|0.82|0.18|0.14|0.12|0.08|0.12|0.08|0.06|0.10|0.12 |[a,b,c,d,f] |
|0.57|0.17|0.13|0.14|0.05|0.16|0.05|0.04|0.09|0.13 |[a,b,f,d,c] |
in "possible_labels" columns, top 5 column names are not there, they are the column name which satisfy the condition
dataframe[column_name] >= dataframe['median']
where:
column_name = ['a','b','c','d','e','f','g','h','i']
CodePudding user response:
Use custom function with sorting and filtering:
def f(x):
y = x[:-1].sort_values(ascending=False)
return y.index[y >= x['median']].tolist()
df['possible_labels'] = df.apply(f, axis=1)
print (df)
a b c d e f g h i median \
0 0.10 0.08 0.29 0.13 0.03 0.29 0.06 0.03 0.06 0.08
1 0.82 0.18 0.14 0.12 0.08 0.12 0.08 0.06 0.10 0.12
2 0.57 0.17 0.13 0.14 0.05 0.16 0.05 0.04 0.09 0.13
possible_labels
0 [c, f, d, a, b]
1 [a, b, c, d, f]
2 [a, b, f, d, c]
Another solution with pandas functions - use DataFrame.melt
for unpivot, filter by DataFrame.query
, sorting by DataFrame.sort_values
and last aggregate list
by index:
column_name = ['a','b','c','d','e','f','g','h','i']
df['possible_labels'] =(df.melt(id_vars=['median'],value_vars=value_vars,ignore_index=False)
.query('value >= median')
.sort_values('value',ascending=False)
.groupby(level=0)['variable']
.agg(list))
print (df)
a b c d e f g h i median \
0 0.10 0.08 0.29 0.13 0.03 0.29 0.06 0.03 0.06 0.08
1 0.82 0.18 0.14 0.12 0.08 0.12 0.08 0.06 0.10 0.12
2 0.57 0.17 0.13 0.14 0.05 0.16 0.05 0.04 0.09 0.13
possible_labels
0 [c, f, d, a, b]
1 [a, b, c, d, f]
2 [a, b, f, d, c]