I have a dataframe of the form:
df = pd.DataFrame(np.array([[0.2, 0.5, 0.3, 0.1],
[0.1, 0.2, 0.5, 0.6],
[0.4, 0.3, 0.3, 0.6]]),
columns=['a', 'b', 'X', 'Y'])
and I want to perform all possible scatter plots between two subsets of columns: set1 = ['a', 'b']
, and set2 = ['X', 'Y']
. I'd like to place all these subplots in a "matrix" like:
size_set1 = len(set1)
size_set2 = len(set2)
df.plot(subplots=True, layout=(size_set1,size_set2), figsize=(30,30));
This is as far as I got, but this code does not produce scatter plots, and it just seems to plot each column instead of columns against each other.
The desired output should be (for this example) 4 scatter plots, (X,a), (X,b), (Y,a), (Y,b)
, arranged 2 above and 2 below.
CodePudding user response:
Maybe something like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# added for layout; delete these two lines if you want "blank" plots
import seaborn as sns
sns.set_theme()
df = pd.DataFrame(np.array([[0.2, 0.5, 0.3, 0.1],
[0.1, 0.2, 0.5, 0.6],
[0.4, 0.3, 0.3, 0.6]]),
columns=['a', 'b', 'X', 'Y'])
set1 = ['a', 'b']
set2 = ['X','Y']
fig, ax = plt.subplots(nrows=len(set1), ncols=len(set2), figsize=[12,12])
xmin = 0
xmax = 0.8
ymin = 0
ymax = 0.8
for i in range(len(set1)):
for j in range(len(set2)):
ax[i,j].scatter(df[set2[j]], df[set1[i]])
ax[i,j].set_xlabel(set2[j])
ax[i,j].set_ylabel(set1[i])
ax[i,j].title.set_text(f'plot ({set2[j]},{set1[i]})')
ax[i,j].set_xlim(xmin, xmax)
ax[i,j].set_ylim(ymin, ymax)
fig.suptitle("Scatter plots", fontsize=16)
plt.tight_layout()
plt.show()
Result:
If you want to extend this with a column c
, we can do:
df = pd.DataFrame(np.array([[0.2, 0.5, 0.1, 0.3, 0.1],
[0.1, 0.2, 0.2, 0.5, 0.6],
[0.4, 0.3, 0.3, 0.3, 0.6]]),
columns=['a', 'b', 'c', 'X', 'Y'])
# also adjust the set, of course:
set1 = ['a', 'b', 'c']
Result:
CodePudding user response:
You can try to do it like this:
import itertools
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame(np.array([[0.2, 0.5, 0.3, 0.1],
[0.1, 0.2, 0.5, 0.6],
[0.4, 0.3, 0.3, 0.6]]),
columns=['a', 'b', 'X', 'Y'])
set1 = ['a', 'b']
set2 = ['X', 'Y']
size_set1 = len(set1)
size_set2 = len(set2)
pairs = list(itertools.product(set1,set2))
for i in range(len(pairs)):
plt.subplot(size_set2, size_set1, i 1)
plt.scatter(df[pairs[i][0]], df[pairs[i][1]])
plt.show()
See also this this question and the answers: How to plot in multiple subplots