Home > Back-end >  Issues with converting date time to proper format- Columns must be same length as key
Issues with converting date time to proper format- Columns must be same length as key

Time:12-04

I'm doing some data analysis on a dataset (https://www.kaggle.com/sudalairajkumar/covid19-in-usa) and Im trying to convert the date and time column (lastModified) to the proper datetime format. When I tried it first it returned an error

ValueError: hour must be in 0..23

so I tried doing this -

data_df[['date','time']] = 
data_df['lastModified'].str.split(expand=True)
data_df['lastModified'] = (pd.to_datetime(data_df.pop('date'), 
format='%d/%m/%Y')   
              pd.to_timedelta(data_df.pop('time')   ':00'))

This gives an error - Columns must be same length as key

I understand this means that both columns I'm splitting arent the same size. How do I resolve this issue? I'm relatively new to python. Please explain in a easy to understand manner. thanks very much

This is my whole code-

import pandas as pd
dataset_url = 'https://www.kaggle.com/sudalairajkumar/covid19-in-                
usa'
import opendatasets as od
od.download(dataset_url)
data_dir = './covid19-in-usa'
import os
os.listdir(data_dir)


data_df = pd.read_csv('./covid19-in-usa/us_covid19_daily.csv')
data_df
data_df[['date','time']] = 
data_df['lastModified'].str.split(expand=True)
data_df['lastModified'] = (pd.to_datetime(data_df.pop('date'), 
format='%d/%m/%Y')   
          pd.to_timedelta(data_df.pop('time')   ':00'))

CodePudding user response:

Looks like lastModified is in ISO format. I have used something like below to convert iso date string:

from dateutil import parser
from datetime import datetime
...
timestamp = parser.isoparse(lastModified).timestamp()
dt = datetime.fromtimestamp(timestamp)
...

CodePudding user response:

On this line:

data_df[['date','time']] = data_df['lastModified'].str.split(expand=True)

In order to do this assignment, the number of columns on both sides of the = must be the same. split can output multiple columns, but it will only do this if it finds the character it's looking for to split on. By default, it splits by whitespace. There is no whitespace in the date column, and therefore it will not split. You can read the documentation for this here.

For that reason, this line should be like this, so it splits on the T:

data_df[['date','time']] = data_df['lastModified'].str.split('T', expand=True)

But the solution posted by @southiejoe is likely to be more reliable. These timestamps are in a standard format; parsing them is a previously-solved problem.

CodePudding user response:

You need these libraries

#import 
from dateutil import parser
from datetime import datetime

Then try writing something similar for convert the date and time column. This way the columns should be the same length as the key

#convert the time column to the correct datetime format 
clock = parser.isoparse(lastModified).timestamp()

#convert the date column to the correct datetime format 
data = datetime.fromtimestamp(timestamp)
  • Related