Home > Back-end >  Plotting data using matplotlib from csv but the numbers on the y-axis are not in order
Plotting data using matplotlib from csv but the numbers on the y-axis are not in order

Time:11-27

I'm new to Python and I have been trying to plot a graph using matplotlib in PyCharm from csv file. The x-axis is months and y-axis is sales, but the numbers on the y axis are not in the right order. I have read that I need to convert it to float but it says "ValueError: could not convert string to float: 'sales' ". I think this is because in the csv file the header of the row with the sales data is 'sales' so it can't convert the word 'sales' to float. How do I make it ignore the header and convert the rest of the values to float? Or if that's not what's wrong, can someone please help me fix it?:)

This is the code I have (without my attempt to convert to float):

import matplotlib.pyplot as plt

x = []
y = []

with open('sales.csv','r') as sales_csv:
    plots = csv.reader(sales_csv, delimiter=',')
    for row in plots:
        x.append(row[1])
        y.append(row[2])

plt.plot(x, y, color='r', label='Monthly Sales 2018', marker='o')
plt.xlabel('Month')
plt.ylabel('Sales')
plt.title('Monthly Sales 2018')
plt.legend()

plt.show()

And please find attached a screenshot of how the graph looks like. graph

Also, just for reference, this is the csv file (only need to plot month and sales)

year, month,sales,expenditure
2018,jan,6226,3808
2018,feb,1521,3373
2018,mar,1842,3965
2018,apr,2051,1098
2018,may,1728,3046
2018,jun,2138,2258
2018,jul,7479,2084
2018,aug,4434,2799
2018,sep,3615,1649
2018,oct,5472,1116
2018,nov,7224,1431
2018,dec,1812,3532

Any help would be appreciated!

CodePudding user response:

Since the CSV has a header, you can use csv.DictReader(sales_csv) to read your CSV file. By default, it will read the first line in your CSV as the column names of your CSV instead of using it as a regular row. Then, when you iterate over the rows, you can use row["month"] and row["sales"] to access the appropriate columns.

with open('sales.csv','r') as sales_csv:
    plots = csv.DictReader(sales_csv, delimiter=',')
    for row in plots:
        x.append(row["month"])
        y.append(float(row["sales"]))

CodePudding user response:

Just paste your data in a file and save it as test.csv and run this. Note that your second column name is ' month' and not 'month' because the data that you've pasted as of now has a space post the comma after the first column. Either keep that and run this code or remove that and edit this code to replace ' month' with 'month'.

import pandas as pd
from matplotlib import pyplot as plt
import matplotlib.dates as mdates

# paste your data into a file and save it as test.csv
# Please note that read_csv assumes that row 0 is the header, so, 
# we don't need to pass that argument for your case
data = pd.read_csv('test.csv') 

data[' month'] = data[' month'].str.title()
data['Date'] = data[' month']
# converting type from str to pandas datetime stamps
data['Date'] = pd.to_datetime(data['Date'], format='%b')
# changing the year from 1900 (default) to 2018(desired)
data['Date'] = data['Date'].mask(data['Date'].dt.year == 1900, 
                             data['Date']   pd.offsets.DateOffset(year=2018))

plt.plot(data['Date'], data['sales'], color='r', label='Monthly Sales 2018', marker='o')

# x-axis date representation formatting
myFmt = mdates.DateFormatter('%b')
plt.gca().xaxis.set_major_formatter(myFmt)

plt.xlabel('Month')
plt.ylabel('Sales')
plt.title('Monthly Sales 2018')
plt.legend()
plt.show()

Reference: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

  • Related