I have a df with a column that some values are having ...
and some ..
and some are without dots.
Type range
Mike 10..13
Ni 3..4
NANA 2...3
Gi 2
desired output should look like this
Type range
Mike 10
Mike 11
Mike 12
MIke 13
Ni 3
Ni 4
NANA 2
NANA 3
Gi 2
So dots represnt the range of between to number ( inclusive the end number).
How am I suppsoed to do it in pandas?
CodePudding user response:
Parse str as list first and then explode:
import re
def str_to_list(s):
if not s: return []
nums = re.split('\.{2,3}', s)
if len(nums) == 1:
return nums
return list(range(int(nums[0]), int(nums[1]) 1))
df['range'] = df['range'].astype(str).map(str_to_list)
df.explode('range')
Type range
0 Mike 10
0 Mike 11
0 Mike 12
0 Mike 13
1 Ni 3
1 Ni 4
2 NANA 2
2 NANA 3
3 Gi 2
CodePudding user response:
An approach using numpy.arange
with pandas.DataFrame.explode
:
out = (
df
.assign(range=
df["range"]
.str.replace("\. ", "-", regex=True)
.str.split("-")
.apply(lambda x: np.arange(list(map(int, x))[0], list(map(int, x))[-1] 1, 1) if len(x)>1 else x))
.explode("range", ignore_index=True)
)
# Output :
print(out)
Type range
0 Mike 10
1 Mike 11
2 Mike 12
3 Mike 13
4 Ni 3
5 Ni 4
6 NANA 2
7 NANA 3
8 Gi 2