I have a df which looks like this
Type range
Mike 10..13|7|8|
Ni 3..4
NANA 2|1|6
and desired output should look like this
Type range
Mike 10
Mike 11
Mike 12
Mike 13
Mike 7
Mike 8
Nico 3
Nico 4
NANA 2
NANA 1
NANA 6
so, Totaling column presenet the multiple values per Type. range values are presnted with two number
seperated by two ..
and one value (with no range) is presented between two | |
CodePudding user response:
Assuming that your ranges are inclusive, which I assume because your '3..4' translates to a row with 3 and a row with 4, and assuming that you forgot to put Mike 14 and Mike 15 in your example output, I found the following solution:
import pandas as pd
def parse_str(s):
numbers = []
for v in s.rstrip('|').split('|'):
if v.isdigit():
numbers.append(int(v))
else:
start, end = v.split('..')
numbers.extend(list(range(int(start), int(end) 1)))
return pd.Series(numbers)
df.index = df['Type']
dfnew = df['range'].apply(parse_str).stack().reset_index(level=0).rename(columns={0: 'range'})
We write a function that parses the string, which means splitting the string by |
and converting the numbers to integers if the string is already a number. Otherwise, it's a range so we split again by ..
and create a list with all the numbers in the range. In the end, we return a pd.Series containing all the numbers from the string.
Then, we apply that function to the column with df['range'].apply
and stack the result. To assure we still keep the names, we have to first set it as the index of the dataframe.
CodePudding user response:
You can do
# split by '|' and explode
df = df.assign(range=df['range'].str.split('|')).explode('range')
# get the range(i, j) if the string has '..'
df['range'] = df['range'].apply(lambda r: range(int(r.split('..')[0]), int(r.split('..')[1])) if (len(r.split('..')) == 2) else r)
# explode
df = df.explode('range')
df
Type range
0 Mike 10
0 Mike 11
0 Mike 12
0 Mike 13
0 Mike 14
0 Mike 7
0 Mike 8
1 Ni 3
2 NANA 2
2 NANA 1
2 NANA 6