Check if column contains (/,-,_, *or~) and split in another column

I have a column with numbers and one of these characters between them -,/,*,~,_. I need to check if values contain any of the characters, then split the value in another column. Is there a different solution than shown below? In the end, columns subnumber1, subnumber2 ...subnumber5 will be merged in one column and column "number5" will be without characters. Those two columns I need to use in further process. I'm a newbie in Python so any advice is welcome.

if gdf['column_name'].str.contains('~').any():
    gdf[['number1', 'subnumber1']] = gdf['column_name'].str.split('~', expand=True)
gdf
if gdf['column_name'].str.contains('^').any():
    gdf[['number2', 'subnumber2']] = gdf['column_name'].str.split('^', expand=True)
gdf
Input column:
column_name  
152/6*3
163/1-6
145/1
163/6^3

output:
 number5 |subnumber1 |subnumber2
152      | 6         |  3
163      | 1         |  6
145      | 1         |
163      | 6         |  3

CodePudding user response：

Use Series.str.split with list of possible separators and create new DataFrame:

import re

L = ['-','/','*','~','_','^', '.']

#some values like `^.` are escape
pat = '|'.join(re.escape(x) for x in L)
df = df['column_name'].str.split(pat, expand=True).add_prefix('num')
print (df)
  num0 num1  num2
0  152    6     3
1  163    1     6
2  145    1  None
3  163    6     3

CodePudding user response：

Use str.split:

df['column_name'].str.split(r'[*,-/^_]', expand=True)

output:

     0  1     2
0  152  6     3
1  163  1     6
2  145  1  None
3  163  6     3

Or, if you know in advance that you have 3 numbers, use str.extract and named capturing groups:

regex = '(?P<number5>\d )\D*(?P<subnumber1>\d*)\D*(?P<subnumber2>\d*)'
df['column_name'].str.extract(regex)

output:

  number5 subnumber1 subnumber2
0     152          6          3
1     163          1          6
2     145          1           
3     163          6          3