I have this simple csv:
date,count
2020-07-09,144.0
2020-07-10,143.5
2020-07-12,145.5
2020-07-13,144.5
2020-07-14,146.0
2020-07-20,145.5
2020-07-21,146.0
2020-07-24,145.5
2020-07-28,143.0
2020-08-05,146.0
2020-08-10,147.0
2020-08-11,147.5
2020-08-14,146.5
2020-09-01,143.5
2020-09-02,143.0
2020-09-09,144.5
2020-09-10,143.5
2020-09-25,144.0
2021-09-21,132.4
2021-09-23,131.2
2021-09-25,131.0
2021-09-26,130.8
2021-09-27,130.6
2021-09-28,128.4
2021-09-30,126.8
2021-10-02,126.2
If I copy it into excel and scatter plot it, it looks like this
This is correct; there should be a big gap in the middle (look carefully at the data, it jumps from 2020 to 2021)
However if I do this in python:
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv')
data.plot.scatter('date', 'count')
plt.show()
It looks like this:
It evenly spaces them at the gap is gone. How do I stop that behavior? I tried to do
plt.xticks = data.date
But that didn't do anything different.
CodePudding user response:
I dont exactly know the types of columns in data but it is probably beacuse tpye of 'date' column is string. So python does not understand comperable value. Before plotting try to convert it's type.
data['date'] = pd.to_datetime(data['date'])
CodePudding user response:
I've tested:
import io
import pandas as pd
txt = """
date,count
2020-07-09,144.0
2020-07-10,143.5
2020-07-12,145.5
2020-07-13,144.5
2020-07-14,146.0
2020-07-20,145.5
2020-07-21,146.0
2020-07-24,145.5
2020-07-28,143.0
2020-08-05,146.0
2020-08-10,147.0
2020-08-11,147.5
2020-08-14,146.5
2020-09-01,143.5
2020-09-02,143.0
2020-09-09,144.5
2020-09-10,143.5
2020-09-25,144.0
2021-09-21,132.4
2021-09-23,131.2
2021-09-25,131.0
2021-09-26,130.8
2021-09-27,130.6
2021-09-28,128.4
2021-09-30,126.8
2021-10-02,126.2"""
data = pd.read_csv(io.StringIO(txt), sep=r",", parse_dates=["date"])
data.plot.scatter('date', 'count')
and the result is:
Two observations:
- date must be of
date
type, which is ensured byparse_dates=["date"]
option - importing
matplotlib.pyplot
is not necessary, because You usedpandas.DataFrame.plot.scatter
method.