I have a following question. I have this dataframe:

import pandas as pd
import matplotlib.pyplot as plt

data = {'date': ["2021-05-03", "2021-05-04", "2021-05-11", "2021-05-19", "2021-05-19"], 'id_customer': ["5", "7", "8", "5", "3"], 'value_dummy': [1, 0, 1, 1,0]}
df = pd.DataFrame.from_dict(data)

I know how to plot a histogram of all variables in value_dummy column:

df['value_dummy'].value_counts().plot(kind='bar')
plt.show()

But I would like to have one graph with three histograms where each histogram will correspond to one week. In this case I will have "week 1" (for observations from "2021-05-03" and "2021-05-04"), "week 2" (for observation from "2021-05-11") and "week 3" (for all observations from "2021-05-19") on horizontal axis. And corresponding histograms for each week. How can I do that, please?

Desired output is this:

CodePudding user response：

To group by weeks and account for non-contiguous dates and plot subplots of histograms:

import pandas as pd
import matplotlib.pyplot as plt

week1 = pd.date_range(start="2021-05-03", end="2021-05-10")
week2 = pd.date_range(start="2021-05-11", end="2021-05-18")
week3 = pd.date_range(start="2021-05-19", end="2021-05-26")

weeks = [week1, week2, week3]

data = {
    'date':
    ["2021-05-03", "2021-05-04", "2021-05-11", "2021-05-19", "2021-05-19"],
    'id_customer': ["5", "7", "8", "5", "3"],
    'value_dummy': [1, 0, 1, 1, 0]
}
df = pd.DataFrame.from_dict(data)
df['date'] = pd.to_datetime(df.date)
df.set_index('date', inplace=True)
fig, axes = plt.subplots(ncols=len(weeks), figsize=(15, 3))
axes = list(axes)

for w in weeks:
    fig.add_subplot(df.loc[df.index.intersection(w)].plot(kind="hist",
                                                          ax=axes.pop()))

plt.tight_layout()
plt.show()

Of course no use really in visualizing histograms with so little data... I assume you have a larger dataset!

To groupby custom weeks and plot a single bar graph

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

week1 = pd.date_range(start="2021-05-03", end="2021-05-10")  # , freq="W")
week2 = pd.date_range(start="2021-05-11", end="2021-05-18")  # , freq="W")
week3 = pd.date_range(start="2021-05-19", end="2021-05-26")  # , freq="W")

weeks = week1.union(week2).union(week3)

df = pd.DataFrame(
    np.random.randint(100, size=72),
    columns=["value"],
    index=pd.MultiIndex.from_product([weeks, ["week1", "week2", "week3"]]),
)

df.unstack().droplevel(0, axis=1).T.plot(kind="bar", legend=False, rot=5, figsize=(10,8), color='b')

CodePudding user response：

This how you have to do :

import pandas as pd
import matplotlib.pyplot as plt

data = {'date': ["2021-05-03", "2021-05-04", "2021-05-11", "2021-05-19", "2021-05-19"], 'id_customer': ["5", "7", "8", "5", "3"], 'value_dummy': [1, 0, 1, 1,0]}
df = pd.DataFrame.from_dict(data)

fig, axes = plt.subplots(ncols=len(df.columns), figsize=(10,5))
for col, ax in zip(df, axes):
        df['formatted_date'] = pd.to_datetime(df['date'])
        df['week_of_year'] = df.formatted_date.apply(lambda x: x.weekofyear)
        df['week_of_year'].value_counts().sort_index().plot.bar(ax=ax, title=col)

plt.tight_layout()    
plt.show()