Plot many time-series columns in one graph-CodePudding

I have a big data.frame with roughly 100 columns and try to plot all the time-series in one graph. Is there an easy way to deal with it, without specifying every y-axis manually?

This would be a simple example with these time-series: 02K W, 03K W, and 04K W:

import pandas as pd
import matplotlib.pyplot as plt

df1 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
    'index':[0, 1, 2, 3, 4], 
    '02K W':[3.5, 0.1, 3, 'nan', 0.2], 
    '03K W':[4.2, 5.2, 2.5, 3.0, 0.6], 
    '04K W':[1.5, 2.6, 8.2, 4.2, 5.3]}) 

df1['Date'] = pd.to_datetime(df1['Date'])
df1 = df1.set_index('index')

So far, I manually specify all y-axis to plot the individual time-series.

plt.plot(df1['Date'], df1['02K W'])
plt.plot(df1['Date'], df1['03K W'])
plt.plot(df1['Date'], df1['04K W'])

Is there a more elegant way to specify the relevant columns for the plot? Thank you very much for your suggestions :)

CodePudding user response：

import pandas as pd
import matplotlib.pyplot as plt

df1 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
    'index':[0, 1, 2, 3, 4], 
    '02K W':[3.5, 0.1, 3, 'nan', 0.2], 
    '03K W':[4.2, 5.2, 2.5, 3.0, 0.6], 
    '04K W':[1.5, 2.6, 8.2, 4.2, 5.3]}) 

df1['Date'] = pd.to_datetime(df1['Date'])
df1 = df1.set_index('index')

for col in df1.colums[1:]:
    plt.plot(df1['Date'], df1[col])

CodePudding user response：

You can melt your columns and use seaborn.lineplot:

import seaborn as sns

sns.lineplot(data=df1.replace('nan', float('nan')).melt(id_vars=['Date']),
             x='Date', y='value', hue='variable'
            )

output:

CodePudding user response：

Is there a more elegant way to specify the relevant columns for the plot?

Use DataFrame.plot with Date as the index and filter by the desired columns:

columns = ['02K W', '03K W', '04K W']
df1.set_index('Date')[columns].plot()

Note that you have a string 'nan' in your sample data. If this is true in your real data, you should convert it to a real np.nan, e.g., with pd.to_numeric or DataFrame.replace.