Home > Enterprise >  Bug when indexing date column in Pandas
Bug when indexing date column in Pandas

Time:02-28

I'm trying to make pandas recognise the first column as a date.

import csv
import pandas as pd
import plotly.express as px
cl = open('cl.csv')
cl = pd.read_csv('CL.csv', parse_dates=['Date'], index_col=['Date'])
cl.info()

Then to visualise the price:

fig = px.line(cl, y="Adj Close", title='Crude Oil Price', labels = {'Adj Close':'Crude Oil Price(in USD)'})

But it gives back a ruined chart:

Date indexed chart

If I comment out 'parse_dates=['Date'], index_col=['Date'])' and just leave 'cl = pd.read_csv('CL.csv')' the chart will look just fine.

Chart without date

What am I doing wrong here?

CodePudding user response:

  I think this problem can be caused by the type of date format that column contains ('Date'), so researching the documentation, I quote the following: For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See Parsing a CSV with mixed timezones for more, then you could replace cl = pd.read_csv('CL.csv', parse_dates=['Date'], index_col=['Date']) with cl = pd.read_csv('CL.csv', parse_dates=['Date'], date_parser=lambda col: pd.to_datetime(col, utc=True))

CodePudding user response:

If you print c1 out and the dates look fine, then the reason behind the graph could likely be that your c1 wasn't sorted by Date, do the following before visualizing it:

c1 = c1.sort_values('Date')
  • Related