I have data which looks like this:
file | timestamps
1 | 02/01/1970
1 | 03/01/1970
1 | 04/01/1970
1 | 05/01/1970
2 | 06/01/1970
2 | 07/01/1970
3 | 08/01/1970
3 | 09/01/1970
3 | 10/01/1970
On the x-axis I would like to have the number of rows per file
. On the y-axis I would like to have timestamps
. It should look similar to this plot but I do not know how to get this plot. Is this a waterfall plot?
CodePudding user response:
Not a lot of data, but this is the result with your example
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import pandas as pd
data = [[1, '02/01/1970'],
[1, '03/01/1970'],
[1, '04/01/1970'],
[1, '05/01/1970'],
[2, '06/01/1970'],
[2, '07/01/1970'],
[3, '08/01/1970'],
[3, '09/01/1970'],
[3, '10/01/1970']]
df = pd.DataFrame(data, columns = ['file', 'timestamps'])
df['timestamps'] = pd.to_datetime(df['timestamps'], format = '%d/%m/%Y')
tot_delta_d = 0
tot_file = 0
fig, ax = plt.subplots()
for f in df['file'].unique():
delta_d = df[df['file'] == f]['timestamps'].max() - df[df['file'] == f]['timestamps'].min()
rect = patches.Rectangle((tot_delta_d, tot_file),
delta_d.days,
df[df['file'] == f].shape[0],
color='indigo')
ax.add_patch(rect)
tot_delta_d = delta_d.days
tot_file = df[df['file'] == f].shape[0]
plt.xlim([0, tot_delta_d])
plt.ylim([0, tot_file])
ax.set_xlabel('Parquets')
ax.set_ylabel('Timestamps')
ax.invert_yaxis()
plt.show()
output :