Home > Software engineering >  How to stop matplotlib from skipping gaps in data?
How to stop matplotlib from skipping gaps in data?

Time:10-03

I have this simple csv:

date,count
2020-07-09,144.0
2020-07-10,143.5
2020-07-12,145.5
2020-07-13,144.5
2020-07-14,146.0
2020-07-20,145.5
2020-07-21,146.0
2020-07-24,145.5
2020-07-28,143.0
2020-08-05,146.0
2020-08-10,147.0
2020-08-11,147.5
2020-08-14,146.5
2020-09-01,143.5
2020-09-02,143.0
2020-09-09,144.5
2020-09-10,143.5
2020-09-25,144.0
2021-09-21,132.4
2021-09-23,131.2
2021-09-25,131.0
2021-09-26,130.8
2021-09-27,130.6
2021-09-28,128.4
2021-09-30,126.8
2021-10-02,126.2

If I copy it into excel and scatter plot it, it looks like this

excel

This is correct; there should be a big gap in the middle (look carefully at the data, it jumps from 2020 to 2021)

However if I do this in python:

import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv')
data.plot.scatter('date', 'count')
plt.show()

It looks like this:

matplotlib

It evenly spaces them at the gap is gone. How do I stop that behavior? I tried to do

 plt.xticks = data.date

But that didn't do anything different.

CodePudding user response:

I dont exactly know the types of columns in data but it is probably beacuse tpye of 'date' column is string. So python does not understand comperable value. Before plotting try to convert it's type.

data['date'] = pd.to_datetime(data['date'])

CodePudding user response:

I've tested:

import io
import pandas as pd

txt = """
date,count
2020-07-09,144.0
2020-07-10,143.5
2020-07-12,145.5
2020-07-13,144.5
2020-07-14,146.0
2020-07-20,145.5
2020-07-21,146.0
2020-07-24,145.5
2020-07-28,143.0
2020-08-05,146.0
2020-08-10,147.0
2020-08-11,147.5
2020-08-14,146.5
2020-09-01,143.5
2020-09-02,143.0
2020-09-09,144.5
2020-09-10,143.5
2020-09-25,144.0
2021-09-21,132.4
2021-09-23,131.2
2021-09-25,131.0
2021-09-26,130.8
2021-09-27,130.6
2021-09-28,128.4
2021-09-30,126.8
2021-10-02,126.2"""

data = pd.read_csv(io.StringIO(txt), sep=r",", parse_dates=["date"])

data.plot.scatter('date', 'count')

and the result is:

enter image description here

Two observations:

  • date must be of date type, which is ensured by parse_dates=["date"] option
  • importing matplotlib.pyplot is not necessary, because You used pandas.DataFrame.plot.scatter method.
  • Related