Split only one the first separator and keep all-CodePudding

I have a df with column entries that look like this:

data = [['5-820.0g:2021-05-18T07:25, 5-986.x:2021-05-18T07:25', '5-820.00:2021-05-18T07:25, 5-986.x:2021-05-18T07:25'], ['5-820.00:2021-05-18T07:25, 5-986.x:2021-05-18T07:25', '5-820.00:2021-05-18T07:25, 5-986.x:2021-05-18T07:25']]


df = pd.DataFrame(data, columns=['col_1', 'col_2'])

An I need them to split on the first ':' the 'T' and the ',' and expand them.

If I use the classical

df.column_name.str.split('[\:,T]', expand=True)

It is splitting by the second ':' too. How can I avoid that to get the wanted output:

data_2 = [['5-820.0g', '2021-05-18' ,'07:25' , '5-986.x', '2021-05-18' ,'07:25'], ['5-820.00', '2021-05-18' ,'07:25' , '5-986.x', '2021-05-18' ,'07:25']]

df = pd.DataFrame(data_2, columns=['col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6'])
df

CodePudding user response：

What you want to achieve is not fully clear, but you can maybe restrict the split to the : that is preceded by a d or f:

df.column_name.str.split('(?:(?<=[df]):|[,T])', expand=True)

Or not preceded by a digit:

df.column_name.str.split('(?:(?<!\d):|[,T])', expand=True)

Output:

          0           1      2          3           4      5
0  5-784.0d  2021-03-29  10:15   5-784.0f  2021-03-29  10:15

updated example:

Splitting : if followed by a 4 digits year and -

df.stack().str.split(':(?=\d{4}-)|[,T]', expand=True)

Output:

                0           1      2          3           4      5
0 col_1  5-784.0d  2021-03-29  10:15   5-784.0f  2021-03-29   None
  col_2  5-784.0d  2021-03-29  10:15   5-784.0f  2021-03-29   None
1 col_1  5-820.00  2021-05-18  07:25    5-986.x  2021-05-18  07:25
  col_2  5-820.00  2021-05-18  07:25    5-986.x  2021-05-18  07:25
2 col_1  5-820.00  2021-05-18  07:25    5-986.x  2021-05-18  07:25
  col_2  5-820.00  2021-05-18  07:25    5-986.x  2021-05-18  07:25