I have a dataframe that looks like this:
timestamp Value Color
--------------------------------------------------
0 2018-03-04 07:11:08 34 Red
1 2018-03-04 07:11:09 34 Red
2 2018-03-04 07:11:10 35 Red
3 2018-03-04 07:11:12 36 Red
4 2018-03-04 07:11:14 24 Red
5 2018-03-04 07:11:15 34 Red
...
55 2018-03-04 07:12:17 34 Blue
56 2018-03-04 07:12:18 35 Blue
57 2018-03-04 07:12:19 36 Blue
58 2018-03-04 07:12:20 37 Blue
59 2018-03-04 07:12:21 35 Blue
60 2018-03-04 07:12:22 32 Blue
And so over the course of 60 seconds, for each time stamp, there is a value recorded, but the values are split between two colors, Red and Blue. And so, within this dataframe we see time series curves for two different curves occurring at different times, one after the other, and not overlapping. What I want to do is plot them. However, I want to ignore the timestamps, so that it is assumed they start at the same time, and so just treating each color as an array of ordered values, ignoring time skips and assuming equally spaced time intervals. I simply want to plot the Red curve and the Blue curve on the same chart. How can this be done in python? I am trying simply
plt.plot(Blue, Red)
Though I am not sure how to account for the x-axis, which I simply want to be seconds.
CodePudding user response:
df = pandas.DataFrame({
'times':list(pandas.date_range('2020-01-01',periods=10,freq='15T'))
list(pandas.date_range('2020-01-01',periods=10,freq='15T')),
'colors':['red']*10 ['blue'] * 10,
'value': numpy.random.randint(0,255,20)
})
gives us something like your dataframe
times colors value
0 2020-01-01 00:00:00 red 224
1 2020-01-01 00:15:00 red 47
2 2020-01-01 00:30:00 red 25
3 2020-01-01 00:45:00 red 211
4 2020-01-01 01:00:00 red 18
5 2020-01-01 01:15:00 red 119
6 2020-01-01 01:30:00 red 52
7 2020-01-01 01:45:00 red 246
8 2020-01-01 02:00:00 red 54
9 2020-01-01 02:15:00 red 156
10 2020-01-01 00:00:00 blue 42
11 2020-01-01 00:15:00 blue 55
12 2020-01-01 00:30:00 blue 151
13 2020-01-01 00:45:00 blue 236
14 2020-01-01 01:00:00 blue 207
15 2020-01-01 01:15:00 blue 165
16 2020-01-01 01:30:00 blue 131
17 2020-01-01 01:45:00 blue 199
18 2020-01-01 02:00:00 blue 247
19 2020-01-01 02:15:00 blue 61
we can pivot this using
df2 = df.pivot(index='times',columns=['colors'],values=['value'])
which gives us
value
colors blue red
times
2020-01-01 00:00:00 70 225
2020-01-01 00:15:00 162 78
2020-01-01 00:30:00 188 37
2020-01-01 00:45:00 134 234
2020-01-01 01:00:00 46 73
2020-01-01 01:15:00 76 60
2020-01-01 01:30:00 143 61
2020-01-01 01:45:00 150 198
2020-01-01 02:00:00 82 159
2020-01-01 02:15:00 127 94
now we can easily just plot it...
df2.plot()
pyplot.show()
you can drop the value part of the column name with
df2 = df2.droplevel(0,axis=1)
df2.plot()
pyplot.show()
The other option is to just call it individually
BLUE = df[df['colors'] == 'blue']
RED = df[df['colors'] == 'red']
pyplot.plot(BLUE['times'],BLUE['value'])
pyplot.plot(RED['times'],RED['value'])
pyplot.show()
you could use pandas groupby also (dont do this one probably :P )
def plot_it(group,values):
pyplot.plot(values['times'],values['value'])
df.groupby(['colors']).apply(plot_it)
pyplot.show()
but really the "right" way to handle it is probably the first option (to pivot it to the shape you want)
---- Edit (based on comments) ----
if you dont want the months and to just treat it as a list of y values, just use range as your x
BLUE = df[df['colors'] == 'blue']
RED = df[df['colors'] == 'red']
pyplot.plot(range(len(BLUE)),BLUE['value'])
pyplot.plot(range(len(RED)),RED['value'])
pyplot.show()