I'm trying to make pandas recognise the first column as a date.
import csv
import pandas as pd
import plotly.express as px
cl = open('cl.csv')
cl = pd.read_csv('CL.csv', parse_dates=['Date'], index_col=['Date'])
cl.info()
Then to visualise the price:
fig = px.line(cl, y="Adj Close", title='Crude Oil Price', labels = {'Adj Close':'Crude Oil Price(in USD)'})
But it gives back a ruined chart:
If I comment out 'parse_dates=['Date'], index_col=['Date'])' and just leave 'cl = pd.read_csv('CL.csv')' the chart will look just fine.
What am I doing wrong here?
CodePudding user response:
I think this problem can be caused by the type of date format that column contains ('Date'
), so researching the documentation, I quote the following: For non-standard datetime parsing, use pd.to_datetime
after pd.read_csv
. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime()
with utc=True
. See Parsing a CSV with mixed timezones for more, then you could replace cl = pd.read_csv('CL.csv', parse_dates=['Date'], index_col=['Date'])
with cl = pd.read_csv('CL.csv', parse_dates=['Date'], date_parser=lambda col: pd.to_datetime(col, utc=True))
CodePudding user response:
If you print c1
out and the dates look fine, then the reason behind the graph could likely be that your c1
wasn't sorted by Date
, do the following before visualizing it:
c1 = c1.sort_values('Date')