Score and Score2 column ranges are not working with df.eq(s) because of the input as ranges.
Default values:-
import pandas as pd
df = pd.DataFrame({'Score': ['>285', '285-280', '280-275', '275-260', '<260'],'Grade1': ['A1', 'A2', 'A3', 'A4', 'A5'],'Score2': ['>270', '270-260', '260-250', '250-200', '<200'],'Grade3': ['D1', 'D2', 'D3', 'D4', 'D5'],'Grade4': ['ID1', 'ID2', 'ID3', 'ID4', 'ID5']})
Input:-
df_input = pd.DataFrame({'Score': [290], 'Grade1': ['A1'], 'Score2': [190], 'Grade3': ['D3'], 'Grade4': ['ID2']})
Expected output:-
df_output = {'Tiers': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'Score': {0: 290, 1: '', 2: '', 3: '', 4: ''}, 'Grade1': {0: '', 1: '', 2: '', 3: '', 4: 'A1'}, 'Score2': {0: '', 1: '', 2: '', 3: '', 4: 190}, 'Grade3': {0: '', 1: '', 2: 'D3', 3: '', 4: ''}, 'Grade4': {0: '', 1: 'ID2', 2: '', 3: '', 4: ''}}
s = df_input.loc[0]
mask = df.eq(s)
mask.to_dict() = {'Score': {0: False, 1: False, 2: False, 3: False, 4: True}, 'Grade1': {0: True, 1: False, 2: False, 3: False, 4: False}, 'Score2': {0: True, 1: False, 2: False, 3: False, 4: False}, 'Grade3': {0: False, 1: False, 2: True, 3: False, 4: False}, 'Grade4': {0: False, 1: True, 2: False, 3: False, 4: False}}
CodePudding user response:
Modification of my previous answer to handle missing values and arbitrary order:
# get df_input as Series
s = df_input.loc[0]
# find identical values
mask = df.eq(s)
# match ranges
def match_range(val, ref):
# extract the upper value, for the last replace by np.inf
s = pd.to_numeric(ref.str.extract('[<-](\d )', expand=False)).fillna(np.inf)
# define default output as False
out = np.zeros(len(ref), dtype=bool)
# find matching position
order = np.argsort(s)
match = np.searchsorted(s[order], val)
if match < len(s):
out[order[match]] = True
return out
# apply match_range on columns for which a direct match failed
m = ~mask.any()
mask.loc[:, m] = df.loc[:, m].apply(lambda x: match_range(s[x.name], x))
# generate output
out = (pd.DataFrame(np.where(mask, s, ''),
index=np.arange(len(df)) 1,
columns=df.columns)
.rename_axis('Tiers').reset_index()
)
out
Output:
Tiers Score Grade1 Score2 Grade3 Grade4
0 1 290 A1
1 2 ID2
2 3 D3
3 4 190
4 5