seperate string number ranges in pandas df-CodePudding

I have a df which looks like this

Type    range
Mike    10..13|7|8|
Ni      3..4
NANA    2|1|6

and desired output should look like this

Type    range
Mike    10
Mike    11
Mike    12
Mike    13
Mike    7
Mike    8
Nico    3
Nico    4
NANA    2
NANA    1
NANA    6

so, Totaling column presenet the multiple values per Type. range values are presnted with two number seperated by two .. and one value (with no range) is presented between two | |

CodePudding user response：

Assuming that your ranges are inclusive, which I assume because your '3..4' translates to a row with 3 and a row with 4, and assuming that you forgot to put Mike 14 and Mike 15 in your example output, I found the following solution:

import pandas as pd

def parse_str(s):
    numbers = []
    for v in s.rstrip('|').split('|'):
        if v.isdigit():
            numbers.append(int(v))
        else:
            start, end = v.split('..')
            numbers.extend(list(range(int(start), int(end) 1)))
    return pd.Series(numbers)

df.index = df['Type']
dfnew = df['range'].apply(parse_str).stack().reset_index(level=0).rename(columns={0: 'range'})

We write a function that parses the string, which means splitting the string by | and converting the numbers to integers if the string is already a number. Otherwise, it's a range so we split again by .. and create a list with all the numbers in the range. In the end, we return a pd.Series containing all the numbers from the string.

Then, we apply that function to the column with df['range'].apply and stack the result. To assure we still keep the names, we have to first set it as the index of the dataframe.

CodePudding user response：

You can do

# split by '|' and explode
df = df.assign(range=df['range'].str.split('|')).explode('range')

# get the range(i, j) if the string has '..'
df['range'] = df['range'].apply(lambda r: range(int(r.split('..')[0]), int(r.split('..')[1])) if (len(r.split('..')) == 2) else r)

# explode
df = df.explode('range')
df

    Type    range
0   Mike       10
0   Mike       11
0   Mike       12
0   Mike       13
0   Mike       14
0   Mike        7
0   Mike        8
1     Ni        3
2   NANA        2
2   NANA        1
2   NANA        6