Plot multiple variables on same plot, and panel plot by station ID-CodePudding

I have 2 timeseries dataframes which were sourced from two 2-dimensional arrays. The structure of these dataframes is:

Generate example dataframes

import pandas as pd
import numpy as np
date_range = pd.period_range('1981-01-01','1981-01-04',freq='D')
x = np.arange(8).reshape((4,2))
y = np.arange(8).reshape((4,2))
x = pd.DataFrame(x, index = date_range, columns = ['station1','station2'])
y = pd.DataFrame(y, index = date_range, columns = ['station1','station2'])
print(x)
          station1 station2
1981-01-01  0        1
1981-01-02  2        3
1981-01-03  4        5
1981-01-04  6        7

The Aim

I want to generate a multi-plot where the values of 'x' and 'y' are plotted as lines on the same graphs, with x and y split by colour, but have multiple 'rows' of graphs of each station. Using above example code, each separate graph would be plotting the different station columns.

What I've tried

I tried the seaborn route: first concatenating the two dataframes together - each df represents a variable, so I added them as keys to name these variables after concatenation. Then I used melt to be able to multi-plot them:

df = pd.concat([x , y], keys = ['Var1', 'Var2'])
meltdf = df.melt(var_name = 'Station', value_name = 'Value', ignore_index = False)
print(meltdf)
                  Station  Value
Var1 1981-01-01  station1      0
     1981-01-02  station1      2
     1981-01-03  station1      4
     1981-01-04  station1      6
Var2 1981-01-01  station1      0
     1981-01-02  station1      2
     1981-01-03  station1      4
     1981-01-04  station1      6
Var1 1981-01-01  station2      1
     1981-01-02  station2      3
     1981-01-03  station2      5
     1981-01-04  station2      7
Var2 1981-01-01  station2      1
     1981-01-02  station2      3
     1981-01-03  station2      5
     1981-01-04  station2      7

I want to plot the values of Var1 & Var2 as lines on the same graph for station1, same for station2 and so on. I want to keep the date as index as these should be timeseries plots, with 'date' along the x axis. I tried this non-working code (for example):

import seaborn as sns
sns.relplot(data=df, x = 'Var1', y = 'Var2', kind = 'line', hue = 'keys', row = 'Station')

Should I 'double melt' the dfs instead to have variable type as its own col? The concat keys step seems incorrect.

CodePudding user response：

You were on the right track with pd.concat and pd.melt, followed by seaborn relplot. I would approach it like this:

import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

#data generation
import numpy as np
np.random.seed(123)
date_range = pd.period_range('1981-01-01','1981-01-04',freq='D')
x = np.random.randint(1, 10, (4,2))
y = np.random.randint(1, 10, (4,2))
x = pd.DataFrame(x, index = date_range, columns = ['station1','station2'])
y = pd.DataFrame(y, index = date_range   pd.to_timedelta(1, unit="D"), columns = ['station1','station2'])

#keep information where each data point comes from
x["key"], y["key"] = "x", "y"

#combining dataframes and reshaping 
df = pd.concat([x, y]).melt(["key"], var_name="station", value_name="station_value",  ignore_index = False)

#plotting - the datetime conversion might not be necessary 
#depending on the datetime format of your original dataframes
#best approach is conversion to datetime index when creating the dataframes
fg = sns.relplot(data=df, x = pd.to_datetime(df.index.to_timestamp()), y = "station_value", kind = "line", hue = "key", row = "station")

#shouldn't be necessary but this example had too many ticks for the interval
from matplotlib.dates import DateFormatter, DayLocator
fg.axes[0,0].xaxis.set_major_locator(DayLocator(interval=1))
fg.axes[0,0].xaxis.set_major_formatter(DateFormatter("%y-%m-%d"))

plt.show()

Sample output:

If the pandas version cannot handle duplicate index entries, we can rewrite it to:

import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

#data generation
import numpy as np
np.random.seed(123)
date_range = pd.period_range('1981-01-01','1981-01-04',freq='D')
x = np.random.randint(1, 10, (4,2))
y = np.random.randint(1, 10, (4,2))
x = pd.DataFrame(x, index = date_range, columns = ['station1','station2'])
y = pd.DataFrame(y, index = date_range   pd.to_timedelta(1, unit="D"), columns = ['station1','station2'])

#keep information where each data point comes from
x["key"], y["key"] = "x", "y"
#moving index into a column 
x = x.reset_index()
y = y.reset_index()
#and changing it to datetime values that seaborn can understand
#only necessary because your example contains pd.Period data
x["index"] = pd.to_datetime(x["index"].astype(str))
y["index"] = pd.to_datetime(y["index"].astype(str))

#combining dataframes and reshaping 
df = pd.concat([x, y]).melt(["index", "key"], var_name="station", value_name="station_value")

#plotting
fg = sns.relplot(data=df, x = "index", y = "station_value", kind = "line", hue = "key", row = "station")

#shouldn't be necessary but this example had too many ticks for the interval
from matplotlib.dates import DateFormatter, DayLocator
fg.axes[0,0].xaxis.set_major_locator(DayLocator(interval=1))
fg.axes[0,0].xaxis.set_major_formatter(DateFormatter("%y-%m-%d"))

plt.show()