I want to read an Excel file, sum the values for the years 2021, 2020 and 2019 for the locations from the same region (region B) and then create a graph with two lines (for region A and B) which will show how the values for both regions have changed during the years.
I tried with this code:
import numpy
import pandas as pd
import matplotlib.pyplot as plt
excel_file_path = "Testfile.xlsx"
df = pd.read_excel(excel_file_path)
df_region = df.groupby(['Region']).sum()
x = ["2021", "2020", "2019"]
y1 = df_region["Values2021"]
y2 = df_region["Values2020"]
y3 = df_region["Values2019"]
fig = plt.figure(figsize=(20,5))
plt.plot(x, y1, color = 'red', label = "A")
plt.plot(x, y2, color = 'blue', label = "B")
plt.legend(loc='best')
plt.show()
But it isn't working for me - I get the following error:
"Exception has occurred: ValueError x and y must have same first dimension, but have shapes (3,) and (2,)"
And this is my Excel file:
Location Region Values2021 Values2020 Values2019
Location1 A 720,00 680,00 554,00
Location2 B 340,00 360,00 389,00
Location3 B 320,00 230,00 287,00
What do I need to do to get the result that I want? Any help would be greatly appreciated.
CodePudding user response:
I think you meant to define your y1
and y2
a little differently.
x = ["2021", "2020", "2019"]
fig = plt.figure(figsize=(20,5))
colors = ['red', 'blue']
for i, region in enumerate(df_region.index):
y = df_region.loc[region, :]
plt.plot(x, y, color = colors[i], label = region)
plt.legend(loc='best')
plt.show()
which plots AND reads the region names from the DataFrame.