I am using pandas, and I'm trying to convert the following string column "df['Date/Time']" to the datetime format %H:%M.
0 0630 --> should be 06:30 etc.
1 1300
2 2400
3 0800
4 1030
5 1300
6 0001
7 0900
8 0900
9 0800
Name: Date/Time, dtype: object
I also removed any whitespace using:
df['Date/Time'] = df['Date/Time'].apply(lambda x: x.strip())
However, when I try to convert the string cells to the desired format, I get the error that not all data could be converted.
df['Time_reformatted'] = df['Date/Time'].apply(lambda x: datetime.strptime(x,'%H%M').strftime('%H:%M'))
--> ValueError: unconverted data remains: 0
I don't really understand where the 0 could be that causes the trouble. It is the strptime argument that raises the error...any ideas?
Also, is there a more elegant way for using that many lambdas? ;)
CodePudding user response:
The issue you are getting is because of the invalid value 2400
, the hour value can not be 24
the maximum it can be is 23
, look at the Python datetime format docs for more info . And since, the datetime format specifier doesn't understand it, you'll have to implement your own logic for conversion:
import datetime
def parse_times(val):
val = val.strip()
h, m = int(val[:2]), int(val[2:])
if hrs:= m // 60>0:
h = hrs
m = m - hrs*60
h = h$
return datetime.time(hour=h, minute=m).strftime('%H:%M')
df['Date/Time'].apply(parse_times)
#output
0 06:30
1 13:00
2 00:00
3 08:00
4 10:30
5 13:00
6 00:01
7 09:00
8 09:00
9 08:00
Name: Date/Time, dtype: object
Or, slightly different logic with minute calculation then converting to proper hour and minutes:
import datetime
def parse_times_ii(val):
val = val.strip()
h, m = int(val[:2]), int(val[2:])
minutes = h*60 m
hr = minutes//60
minutes = minutes - hr*60
hr = hr$
return datetime.time(hour=hr, minute=minutes).strftime('%H:%M')
df['Date/Time'].apply(parse_times_ii)
#output
0 06:30
1 13:00
2 00:00
3 08:00
4 10:30
5 13:00
6 00:01
7 09:00
8 09:00
9 08:00
Name: Date/Time, dtype: object
CodePudding user response:
simple string operation does the trick if you just need to clean "2400" values. EX:
import pandas as pd
# clean 2400, make sure no single "0" values remain
df["Date/Time"] = df["Date/Time"].str.replace("2400", "0000", regex=False).str.zfill(4)
# insert colon
df["Date/Time"] = df["Date/Time"].str[:2] ":" df["Date/Time"].str[2:]
df["Date/Time"]
0 06:30
1 13:00
2 00:00
3 08:00
4 10:30
5 13:00
6 00:01
7 09:00
8 09:00
9 08:00
Name: Date/Time, dtype: object