Home > Software design >  Convert DataFrame with 5*5 grid of choice with string of ranges
Convert DataFrame with 5*5 grid of choice with string of ranges

Time:01-04

Score and Score2 column ranges are not working with df.eq(s) because of the input as ranges.

Default values:-

import pandas as pd

df = pd.DataFrame({'Score': ['>285', '285-280', '280-275', '275-260', '<260'],'Grade1': ['A1', 'A2', 'A3', 'A4', 'A5'],'Score2': ['>270', '270-260', '260-250', '250-200', '<200'],'Grade3': ['D1', 'D2', 'D3', 'D4', 'D5'],'Grade4': ['ID1', 'ID2', 'ID3', 'ID4', 'ID5']})

Input:-

df_input = pd.DataFrame({'Score': [290], 'Grade1': ['A1'], 'Score2': [190], 'Grade3': ['D3'], 'Grade4': ['ID2']})

Expected output:-

df_output = {'Tiers': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'Score': {0: 290, 1: '', 2: '', 3: '', 4: ''}, 'Grade1': {0: '', 1: '', 2: '', 3: '', 4: 'A1'}, 'Score2': {0: '', 1: '', 2: '', 3: '', 4: 190}, 'Grade3': {0: '', 1: '', 2: 'D3', 3: '', 4: ''}, 'Grade4': {0: '', 1: 'ID2', 2: '', 3: '', 4: ''}}

s = df_input.loc[0]

mask = df.eq(s)

mask.to_dict() = {'Score': {0: False, 1: False, 2: False, 3: False, 4: True}, 'Grade1': {0: True, 1: False, 2: False, 3: False, 4: False}, 'Score2': {0: True, 1: False, 2: False, 3: False, 4: False}, 'Grade3': {0: False, 1: False, 2: True, 3: False, 4: False}, 'Grade4': {0: False, 1: True, 2: False, 3: False, 4: False}}

CodePudding user response:

Modification of my previous answer to handle missing values and arbitrary order:

# get df_input as Series
s = df_input.loc[0]

# find identical values
mask = df.eq(s)

# match ranges
def match_range(val, ref):
    # extract the upper value, for the last replace by np.inf
    s = pd.to_numeric(ref.str.extract('[<-](\d )', expand=False)).fillna(np.inf)
    # define default output as False
    out = np.zeros(len(ref), dtype=bool)
    # find matching position
    order = np.argsort(s)
    match = np.searchsorted(s[order], val)
    if match < len(s):
        out[order[match]] = True
    return out

# apply match_range on columns for which a direct match failed
m = ~mask.any()
mask.loc[:, m] = df.loc[:, m].apply(lambda x: match_range(s[x.name], x))

# generate output
out = (pd.DataFrame(np.where(mask, s, ''),
                    index=np.arange(len(df)) 1,
                    columns=df.columns)
         .rename_axis('Tiers').reset_index()
      )

out

Output:

   Tiers Score Grade1 Score2 Grade3 Grade4
0      1   290     A1                     
1      2                               ID2
2      3                         D3       
3      4                 190              
4      5                                  
  • Related