Home > Software engineering >  How to plot multiple time series from a CSV while the data points are in different columns
How to plot multiple time series from a CSV while the data points are in different columns

Time:04-18

I have a data frame (loading from CSV) file that looks like below one

     Data      Mean        sd   time__1   time__2   time__3   time__4   time__5
0  Data_1  0.947667  0.025263  0.501517  0.874750  0.929426  0.953847  0.958375
1  Data_2  0.031960  0.017314  0.377588  0.069185  0.037523  0.024028  0.021532

Now, I wanted to plot 2 time series plots for (data_1, data_2) with (time__1, time__2, etc) as a timepoint. The x axis is (time__1, time__2, etc) and the y axis is their associated values.

The code I am trying

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

data = pd.read_csv("file.csv", delimiter=',', header=0) 
data = data.drop(["Unnamed: 0"], axis=1)

# Set the date column as the index
data = data.set_index(["time__1", "time__2", "time__3", "time__4", "time__5"])
ax = data.plot(linewidth=2, fontsize=12)
ax.set_xlabel('Data')
ax.legend(fontsize=12)
plt.savefig("series.png")
plt.show()

The figure I am getting is not as expected. enter image description here

I think I am doing some wrong with set_index() as my time points are in different columns.

How can I plot time-series when time points are in different columns?

Reproducible data as dictionary formate

{'Data': {(0.501517236232758, 0.874750375747681, 0.929425954818726, 0.953846752643585, 0.958374977111816): 'Data_1', (0.377588421106338, 0.069185301661491, 0.037522859871388, 0.0240284409374, 0.021532088518143): 'Data_2'}, 'Mean': {(0.501517236232758, 0.874750375747681, 0.929425954818726, 0.953846752643585, 0.958374977111816): 0.947667360305786, (0.377588421106338, 0.069185301661491, 0.037522859871388, 0.0240284409374, 0.021532088518143): 0.031959813088179}, 'sd': {(0.501517236232758, 0.874750375747681, 0.929425954818726, 0.953846752643585, 0.958374977111816): 0.025263005867601, (0.377588421106338, 0.069185301661491, 0.037522859871388, 0.0240284409374, 0.021532088518143): 0.017313838005066}}

CodePudding user response:

IIUC you are getting the index wrong: If time__1, time__2 etc. is supposed to be your x-axis, that's what you want your index to be. The plot data series names are the columns. Therefore, you need to transpose your DataFrame. Using the csv data in your first table:

print(df)

# out:
     Data      Mean        sd   time__1   time__2   time__3   time__4  \
0  Data_1  0.947667  0.025263  0.501517  0.874750  0.929426  0.953847   
1  Data_2  0.031960  0.017314  0.377588  0.069185  0.037523  0.024028   

    time__5  
0  0.958375  
1  0.021532  

Changing column names and transposing:

df.drop(["Mean", "sd"], axis=1).set_index("Data").T

yields an appropriately formatted dataframe:

Data       Data_1    Data_2
time__1  0.501517  0.377588
time__2  0.874750  0.069185
time__3  0.929426  0.037523
time__4  0.953847  0.024028
time__5  0.958375  0.021532

which can simply be plotted:

df.plot()

enter image description here

  • Related