Home > Software design >  Problems working with the dates of this dataframe
Problems working with the dates of this dataframe

Time:04-25

I'm working with a dataframe that looks like this:

        REGION  YEAR  WEEK ILITOTAL TOTAL_PATIENTS
0      Alabama  2010    40      249          11664
52     Alabama  2010    41      239          11602
104    Alabama  2010    42      232          11945
156    Alabama  2010    43      274          12036
208    Alabama  2010    44      342          12069
...        ...   ...   ...      ...            ...
22688  Alabama  2018    48     1263          41155
22742  Alabama  2018    49     1152          38716
22796  Alabama  2018    50     1420          38703
22850  Alabama  2018    51     1585          38533
22904  Alabama  2018    52     1977          38097

And the fact that it does not have dates in a YYYY/MM/DD format is giving me some trouble, mostly at plotting stuff. For example, in this case, I want to end up with a plot more or less like this:

plot

To do so, I just left the YEAR, WEEK, and TOTAL_PATIENTS and I have tried to combine the WEEK and YEAR columns,

Sdf_copy["WEEK_YEAR"] = Sdf_copy.WEEK.astype(str).str.cat(Sdf_copy.YEAR.astype(str), sep="-")

Sdf_AL = Sdf_copy[['WEEK_YEAR', 'TOTAL_PATIENTS']].copy()
print(Sdf_AL)

ending up with this:

      WEEK_YEAR TOTAL_PATIENTS
0       40-2010          11664
52      41-2010          11602
104     42-2010          11945
156     43-2010          12036
208     44-2010          12069
...         ...            ...
22688   48-2018          41155
22742   49-2018          38716
22796   50-2018          38703
22850   51-2018          38533
22904   52-2018          38097

I have tried plotting this in different ways, a simple plot

plt.plot(Sdf_AL)
plt.show()

gives this error: TypeError: unhashable type: 'numpy.ndarray' and with something like this

plt.plot(Sdf_AL.TOTAL_PATIENTS)
plt.show()

or

plt.plot(Sdf_AL.WEEK_YEAR, Sdf_AL.TOTAL_PATIENTS)
plt.show()

it always turns out similar to this:

plot

any help will be greatly appreciated, thanks!

CodePudding user response:

There are two parts of issues here: (1) You are getting troubles with dates that has no specific day (2) You are not able to plot correctly.

Addressing (1): The most convenient way is to hard code the week start as the day, and convert it into datetime object for the ease of plotting:

Sdf_AL['WEEK_YEAR'] = pd.to_datetime(Sdf_AL['WEEK_YEAR'] '-1', format='%W-%Y-%w')

The '-1' here is to hard code a weekday number to the column WEEK_YEAR, so that the WEEK_YEAR would look like eg. 40-2010-1 in format of Week-Year-Weekday (before converting into datetime).

Addressing (2): In plt.plot(Sdf_AL.TOTAL_PATIENTS) you did not explicitly specify the x-axis and y-axis of the plot. Hence the plot automatically renders with the index as the x-axis. In your case, the indices in the Sdf_AL is something like [0, 52, 104, 156, 208, ...] hence the weird plot render.

You should explicitly specify the axes. Try

plt.plot(Sdf_AL.WEEK_YEAR, Sdf_AL.TOTAL_PATIENTS)
  • Related