Home > Mobile >  Identify the matching range from a list of valid range
Identify the matching range from a list of valid range

Time:02-11

I am having a csv file , which contains a range of numbers in a column. I also have list of valid range. I need identify the possible range for the matching numbers in csv.

valid_range = ["0-1", "1-2", "0-5", "5-10", "10-15", "15-20", ">20","mixed"]

csv data :

What I have:

enter image description here

What I need:

enter image description here

Anything greater than 20 will have ">20" and mixed will have the same.

df = {'Gender': {0: 'male', 1: 'male', 2: 'male', 3: 'female', 4: 'male', 5: 'male', 6: 'male', 7: 'male', 8: 'male', 9: 'mixed', 10: 'female'}, 'Range': {0: 'mixed', 1: 'mixed', 2: '5-6', 3: '4-5', 4: '5-6', 5: '12-14', 6: '9-10', 7: 'mixed', 8: '17-18', 9: 'mixed', 10: '>20'}}

CodePudding user response:

If you don't have overlap, you can rely on the upper bound.

Then use of a bit or regex magic and pandas.cut:

# non overlapping ranges
valid_range = ["0-5", "5-10", "10-15", "15-20", ">20", "mixed"]

# define bins (could also be coded as list directly!)
bins = (pd.Series(valid_range[:-2]).str.extract('(\d )$', expand=False)
          .astype(int).to_list()
        )
bins = [0] bins [float('inf')]
# [0, 5, 10, 15, 20, inf]

df['Valid_Range'] = (           # replace ">x" with upper bound   1
 pd.cut(df['Range'].str.replace('>.*', f'{bins[-1] 1}', regex=True)
                  # extract trailing digits 
                  .str.extract('(\d )$', expand=False).astype(float),
        bins=bins, labels=valid_range[:-1])
    .values.add_categories(valid_range[-1])  # cut output is Categorical
    .fillna(valid_range[-1])  # fill nans that correspond to initial "mixed"
)

output:

    Gender  Range Valid_Range
0     male  mixed       mixed
1     male  mixed       mixed
2     male    5-6        5-10
3   female    4-5         0-5
4     male    5-6        5-10
5     male  12-14       10-15
6     male   9-10        5-10
7     male  mixed       mixed
8     male  17-18       15-20
9    mixed  mixed       mixed
10  female    >20         >20
  • Related