I'm working with a dataframe that looks like this:
REGION YEAR WEEK ILITOTAL TOTAL_PATIENTS
0 Alabama 2010 40 249 11664
52 Alabama 2010 41 239 11602
104 Alabama 2010 42 232 11945
156 Alabama 2010 43 274 12036
208 Alabama 2010 44 342 12069
... ... ... ... ... ...
22688 Alabama 2018 48 1263 41155
22742 Alabama 2018 49 1152 38716
22796 Alabama 2018 50 1420 38703
22850 Alabama 2018 51 1585 38533
22904 Alabama 2018 52 1977 38097
And the fact that it does not have dates in a YYYY/MM/DD format is giving me some trouble, mostly at plotting stuff. For example, in this case, I want to end up with a plot more or less like this:
To do so, I just left the YEAR, WEEK, and TOTAL_PATIENTS and I have tried to combine the WEEK and YEAR columns,
Sdf_copy["WEEK_YEAR"] = Sdf_copy.WEEK.astype(str).str.cat(Sdf_copy.YEAR.astype(str), sep="-")
Sdf_AL = Sdf_copy[['WEEK_YEAR', 'TOTAL_PATIENTS']].copy()
print(Sdf_AL)
ending up with this:
WEEK_YEAR TOTAL_PATIENTS
0 40-2010 11664
52 41-2010 11602
104 42-2010 11945
156 43-2010 12036
208 44-2010 12069
... ... ...
22688 48-2018 41155
22742 49-2018 38716
22796 50-2018 38703
22850 51-2018 38533
22904 52-2018 38097
I have tried plotting this in different ways, a simple plot
plt.plot(Sdf_AL)
plt.show()
gives this error: TypeError: unhashable type: 'numpy.ndarray' and with something like this
plt.plot(Sdf_AL.TOTAL_PATIENTS)
plt.show()
or
plt.plot(Sdf_AL.WEEK_YEAR, Sdf_AL.TOTAL_PATIENTS)
plt.show()
it always turns out similar to this:
any help will be greatly appreciated, thanks!
CodePudding user response:
There are two parts of issues here: (1) You are getting troubles with dates that has no specific day (2) You are not able to plot correctly.
Addressing (1): The most convenient way is to hard code the week start as the day, and convert it into datetime object for the ease of plotting:
Sdf_AL['WEEK_YEAR'] = pd.to_datetime(Sdf_AL['WEEK_YEAR'] '-1', format='%W-%Y-%w')
The '-1'
here is to hard code a weekday number to the column WEEK_YEAR
, so that the WEEK_YEAR
would look like eg. 40-2010-1 in format of Week-Year-Weekday (before converting into datetime).
Addressing (2):
In plt.plot(Sdf_AL.TOTAL_PATIENTS)
you did not explicitly specify the x-axis and y-axis of the plot. Hence the plot automatically renders with the index as the x-axis. In your case, the indices in the Sdf_AL
is something like [0, 52, 104, 156, 208, ...] hence the weird plot render.
You should explicitly specify the axes. Try
plt.plot(Sdf_AL.WEEK_YEAR, Sdf_AL.TOTAL_PATIENTS)