I have two Dataframes with 3 columns and 7 rows each,
A = {'col1': [20, 3, 45.6, 2, 500, 3e45, nan], 'col2': [nan, 90, 1e3, nan, 78, 6, nan],'col3':[25, 7e56, nan, nan, 23, 0.4, 78.04]}
B = {'a': [1, 24, 30, nan, nan, 56, 2.5], 'b': [100, nan, 10, 78.09, 1e29, 0.84, nan],'c': [nan, 4.6, nan, nan, 9e45, 0.2, nan] }
df_A = pd.DataFrame(data=a)
df_B = pd.DataFrame(data=b)
I’d like to iterate over the two Dataframes in order to do a scatter plot of the first column of df_A (col1) vs the first column of df_B (a), then the second column of A vs the second of B etc. And each plot should have the name of the columns as labels of the axis.
I tried with this, but it gives me as error the fact that x and y aren't of the same size.
for col in df_A:
plt.scatter(df_A[col], df_B[col], label=col)
CodePudding user response:
You are trying to access col both in df_A and df_B and in your example the column names are different which leads to an error.
I managed to reproduce your example by using matplot subplots and this is the code that works:
import numpy as np
import matplotlib.pyplot as plt
#Load the dataframes
nan = np.nan
a = {'col1': [20, 3, 45.6, 2, 500, 3e45, nan], 'col2': [nan, 90, 1e3, nan, 78, 6, nan],'col3':[25, 7e56, nan, nan, 23, 0.4, 78.04]}
b = {'a': [1, 24, 30, nan, nan, 56, 2.5], 'b': [100, nan, 10, 78.09, 1e29, 0.84, nan],'c': [nan, 4.6, nan, nan, 9e45, 0.2, nan] }
df_A = pd.DataFrame(data=a)
df_B = pd.DataFrame(data=b)
#Plot
#To fix the column name problem use enumerate with indexes.
for i,col in enumerate(df_A):
plt.subplot(2, 2, i 1)
plt.title(col)
plt.scatter(df_A[col].values,df_B.iloc[:,i].values)
plt.show()
CodePudding user response:
As already said in the comments, make sure either that both df have the same column names, or maybe try using this approach.
for col in range(len(df_A.columns)):
plt.figure()
plt.scatter(df_A[df_A.columns[col]], df_B[df_B.columns[col]], label=df_A.columns[col])
plt.legend()
If instead you also have an error because the data is of different size, then a solution is to either trim the data of one df, or add some padding with dummy data to the shorter df.