Attempting to make a simple 2D matplotlib plot, but it's re-ordering the y-axis labels to keep the graph linear. How can I avoid this?
Code:
data = np.array([
['Jun 1', 1.2, 0.2],
['Jun 2', 1.3, 1.2],
['Jun 3', 1.4, 0.9],
['Jun 4', 1.1, 0.5],
['Jun 5', 1.6, 1.2],
['Jun 6', 2.2, 0.2],
['Jun 7', 3.4, 1.6]
])
df = pd.DataFrame(data, columns=['date', 'income', 'spent'])
plt.plot(df['income'])
I'd just like a normal graph: X axis: index Y axis: df.income (where the visual plot range is auto-calculated)
The same issue with the scatterplot:
CodePudding user response:
It's because of your types. When numpy sees a string, it assumes everything in the array to be an object. So your income and spent data is actually text if you dig into the dtypes. Reconvert them before plotting:
data = np.array([
['Jun 1', 1.2, 0.2],
['Jun 2', 1.3, 1.2],
['Jun 3', 1.4, 0.9],
['Jun 4', 1.1, 0.5],
['Jun 5', 1.6, 1.2],
['Jun 6', 2.2, 0.2],
['Jun 7', 3.4, 1.6]
])
df = pd.DataFrame(data, columns=['date', 'income', 'spent'])
for c in ['income', 'spent']:
df[c] = df[c].astype(float)
plt.plot(df['income'])
or (better practice):
data = [
['Jun 1', 1.2, 0.2],
['Jun 2', 1.3, 1.2],
['Jun 3', 1.4, 0.9],
['Jun 4', 1.1, 0.5],
['Jun 5', 1.6, 1.2],
['Jun 6', 2.2, 0.2],
['Jun 7', 3.4, 1.6]
]
df = pd.DataFrame(data, columns=['date', 'income', 'spent'])
plt.plot(df['income'])
CodePudding user response:
This is happening because your data types are objects. You can see the data types of your dataframe by using dataframe.dtypes
.
df.dtypes
> date object
> income object
> spent object
> dtype: object
What you need is that the column 'income'
be a float64
, so you can manually set up its data type:
df['income'] = df['income'].astype('float64')
> date object
> income float64
> spent object
> dtype: object
Now, if you plot the df, you get:
You can also set the date
to be the x axis
by setting your date column as an index of the dataframe:
df = df.set_index('date')
CodePudding user response:
I don't see the issue for the x-axis, now for the y-axis the issue is that when you converted the np.array
you didn't specify the dtype
of the array.
if you add this simple line it should work just fine.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = np.array([
['Jun 1', 1.2, 0.2],
['Jun 2', 1.3, 1.2],
['Jun 3', 1.4, 0.9],
['Jun 4', 1.1, 0.5],
['Jun 5', 1.6, 1.2],
['Jun 6', 2.2, 0.2],
['Jun 7', 3.4, 1.6]
], dtype=object)
df = pd.DataFrame(data, columns=['date', 'income', 'spent'])
df.plot(x='date', y='income', kind='line')
plt.show()
sometimes pandas has problems of understanding the data, the problem was that it was treating the income row as a string.