I have a df with column entries that look like this:
data = [['5-820.0g:2021-05-18T07:25, 5-986.x:2021-05-18T07:25', '5-820.00:2021-05-18T07:25, 5-986.x:2021-05-18T07:25'], ['5-820.00:2021-05-18T07:25, 5-986.x:2021-05-18T07:25', '5-820.00:2021-05-18T07:25, 5-986.x:2021-05-18T07:25']]
df = pd.DataFrame(data, columns=['col_1', 'col_2'])
An I need them to split on the first ':' the 'T' and the ',' and expand them.
If I use the classical
df.column_name.str.split('[\:,T]', expand=True)
It is splitting by the second ':' too. How can I avoid that to get the wanted output:
data_2 = [['5-820.0g', '2021-05-18' ,'07:25' , '5-986.x', '2021-05-18' ,'07:25'], ['5-820.00', '2021-05-18' ,'07:25' , '5-986.x', '2021-05-18' ,'07:25']]
df = pd.DataFrame(data_2, columns=['col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6'])
df
CodePudding user response:
What you want to achieve is not fully clear, but you can maybe restrict the split to the :
that is preceded by a d
or f
:
df.column_name.str.split('(?:(?<=[df]):|[,T])', expand=True)
Or not preceded by a digit:
df.column_name.str.split('(?:(?<!\d):|[,T])', expand=True)
Output:
0 1 2 3 4 5
0 5-784.0d 2021-03-29 10:15 5-784.0f 2021-03-29 10:15
updated example:
Splitting :
if followed by a 4 digits year and -
df.stack().str.split(':(?=\d{4}-)|[,T]', expand=True)
Output:
0 1 2 3 4 5
0 col_1 5-784.0d 2021-03-29 10:15 5-784.0f 2021-03-29 None
col_2 5-784.0d 2021-03-29 10:15 5-784.0f 2021-03-29 None
1 col_1 5-820.00 2021-05-18 07:25 5-986.x 2021-05-18 07:25
col_2 5-820.00 2021-05-18 07:25 5-986.x 2021-05-18 07:25
2 col_1 5-820.00 2021-05-18 07:25 5-986.x 2021-05-18 07:25
col_2 5-820.00 2021-05-18 07:25 5-986.x 2021-05-18 07:25