I'm sorry if this is a simple question but I have a csv file with time formats as follows: hh:mm:ss
An extract of the file looks like this:
column_name
00:00:00
01:00:00
02:00:00
03:00:00
...
23:00:00
00:00:00
I have the following regex expression to match all those times
[0-9]{2}[:][0-9]{2}[:][0-9]{2}
My question is how do I get rid of the colon and the seconds (:ss
) essentially changing the format from
hh:mm:ss
to hh:mm
in a python script?
I managed to change all the -
to /
by using this line of code:
df['column_name'] = df['column_name'].str.replace('-', '/')
I tried using this line:
df['column_name'] = [re.sub(r'[0-9]{2}[:][0-9]{2}[:][0-9]{2}', r'[0-9]{2}[:][0-9]{2}', str(x)) for x in df['column_name']]
But this changed all the times to this [0-9]{2}[:][0-9]{2}
I also tried just using slicing such as [:-3]
but I could not get it to work:
df['column_name'] = [re.sub(r'[0-9]{2}[:][0-9]{2}[:][0-9]{2}', [:-3], str(x)) for x in df['column_name']]
Any help would be much appreciated, Thank you
CodePudding user response:
You can slice the string with str:
df['column_name'] = df['column_name'].str[:-3]
Or:
df['column_name'] = df['column_name'].str.rsplit(':', 1).str[0]
CodePudding user response:
if you want to use regex to tackle this, use capture groups:
df.column_name.str.replace(r'(\d{2}):(\d{2}):(\d{2})', r'\1:\2', regex=True)
to define a capture group, you enclose it in parenthesis. you can then refer to the captured string using the group index (starting with one).
also, important to note that since the groups are referred as \<group-number>
, a raw regex string needs to be used.