I have a script that reads a CSV file and it seems to have slowed down recently (I am sure it used to run faster with this very code). I have narrowed the issue down to this line of code:
data['datetime'] = pd.to_datetime(data['datetime'])
The CSV is quite basic:
2021-11-03 09:30:00-04:00,150.39,150.8,150.3,150.47,9583
Yet to run just 2000 rows is taking ~0.2 seconds which seems much slower than I would have thought.
I have tried updating python and pandas in case it was that but the issue is still there.
Is this amount of time normal and is there anything else I can check or do to improve the speed?
EDIT2 - I had recreated the CSV and this I thought had cured it. Unfortauntely it has not and I am still at ~0.2s for this line of code to run
CodePudding user response:
Try this:
df = pd.read_csv(file, parse_dates=['datetime'])
EDIT
if it doesn't work for date format, try this:
dateparse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
df = pd.read_csv(file, parse_dates=['datetime'], date_parser=dateparse)