I have two dataframes, one with daily PM2.5/PM10 ratios for 7 years with about 30 sites. The other dataframe has the mean, standard deviation, and n value for each site. Sites of both dataframes are in the same order. My goal is to make a histogram for each site (each figure in a separate window) and then paste text onto the figure from the other dataframe that includes the corresponding site's mean, std dev, and n. Here are two csv files with my data.
pmf = https://drive.google.com/file/d/1heF1W1x0qS_5SjjPg23j5alt2WiY3HsM/view?usp=sharing
stats_all = https://drive.google.com/file/d/16ik-OY83j0p21SCy8PwXWQnz2nYvc-HV/view?usp=sharing
Here's my code so far:
import pandas as pd
stats_all = pd.read_csv('*****/R2PMratio_meanstdn.csv')
pmf = pd.read_csv('*****/PM25PM10ratio_dailyavg_IVAN.csv')
stats_all = stats_all.set_index('stats')
pmf = pmf.set_index('Date')
pmf.index = pd.to_datetime(pmf.index)
##this code makes the histograms for each site
for i,col in enumerate(pmf.columns):
plt.figure(i)
sns.histplot(pmf[col], binwidth=0.05, color='green')
plt.xlim(0,1)
plt.xlabel('PM$_{2.5}$/PM$_{10}$', fontsize=15)
plt.ylabel('Frequency', fontsize=15)
plt.title(pmf.columns[i], fontsize=20)
#plt.savefig(pmf.columns[i] '_pmratio.png', dpi=400)
I'm not sure how to add the mean, standard deviation, and n value to the figure so that it loops through and puts the correct mean, std dev, and n on the corresponding site figure. I assume I would add something like this to the loop:
plt.text(0.5, 0.85, "mean = " __?__, horizontalalignment='center',
verticalalignment='center',
transform=ax.transAxes)
plt.text(0.5, 0.8, "stdv = " __?__, horizontalalignment='center',
verticalalignment='center',
transform=ax.transAxes)
plt.text(0.5, 0.75, "n = " __?__, horizontalalignment='center',
verticalalignment='center',
transform=ax.transAxes)
but I don't know what I would put in place of the question marks. I haven't had much luck finding anything on Stack Overflow.
CodePudding user response:
In your stats_all
dataframe, it's already arranged nicely where the statistical measures are the rows and the sites are columns. Since you're looping through the sites (columns of pmf
) and creating a plot for each, you just have to use that same loop variable to reference the correct column in stats_all
.
Here's one solution that uses pandas loc
to do a row-column-lookup and Python f-strings to help construct the label. I made a basic example that you could adapt to your own code with dummy versions of pmf
and stats_all
:
>>> pmf = pd.DataFrame({'site1': [1, 2, 3], 'site2':[1, 2, 3]})
site1 site2
0 1 1
1 2 2
2 3 3
>>> stats_all = pd.DataFrame({'site1': [7, 8, 9], 'site2': [10, 11, 12]}, index=['mean', 'n', 'std'])
site1 site2
mean 7 10
n 8 11
std 9 12
You'd use stats_all.loc['row name', 'column name']
to access the value for a given statistical measure and site. Then you use f-strings to create the plot annotation using said value.
# loop through columns of pmf (the sites)
for site in pmf.columns:
# create figure and plot histogram from pmf
plt.figure()
sns.histplot(pmf[site])
# use loc and square brackets to get the right
# statistical measure for the current site
label_mean = stats_all.loc['mean', site]
label_n = stats_all.loc['n', site]
label_std = stats_all.loc['std', site]
# use f-strings to construct the text label which
# reference the variables we created above
plt.text(2, 0.2, f'mean = {label_mean}')
plt.text(2, 0.4, f'n = {label_n}')
plt.text(2, 0.6, f'std = {label_std}')
Which results in the following two images:
EDIT: to set a fixed point for the text on the figure, you need to convert coordinates to pixel coordinates, which is explained in this post setting a fixed position for matplotlib text