print column values onto figures embedded into loop-CodePudding

I have two dataframes, one with daily PM2.5/PM10 ratios for 7 years with about 30 sites. The other dataframe has the mean, standard deviation, and n value for each site. Sites of both dataframes are in the same order. My goal is to make a histogram for each site (each figure in a separate window) and then paste text onto the figure from the other dataframe that includes the corresponding site's mean, std dev, and n. Here are two csv files with my data.

pmf = https://drive.google.com/file/d/1heF1W1x0qS_5SjjPg23j5alt2WiY3HsM/view?usp=sharing

stats_all = https://drive.google.com/file/d/16ik-OY83j0p21SCy8PwXWQnz2nYvc-HV/view?usp=sharing

Here's my code so far:

import pandas as pd

stats_all = pd.read_csv('*****/R2PMratio_meanstdn.csv')
pmf = pd.read_csv('*****/PM25PM10ratio_dailyavg_IVAN.csv')

stats_all = stats_all.set_index('stats')
pmf = pmf.set_index('Date')
pmf.index = pd.to_datetime(pmf.index)

##this code makes the histograms for each site
for i,col in enumerate(pmf.columns):
    plt.figure(i)
    sns.histplot(pmf[col], binwidth=0.05, color='green')
    plt.xlim(0,1)
    plt.xlabel('PM$_{2.5}$/PM$_{10}$', fontsize=15)
    plt.ylabel('Frequency', fontsize=15)
    plt.title(pmf.columns[i], fontsize=20)
    #plt.savefig(pmf.columns[i]   '_pmratio.png', dpi=400)

I'm not sure how to add the mean, standard deviation, and n value to the figure so that it loops through and puts the correct mean, std dev, and n on the corresponding site figure. I assume I would add something like this to the loop:

plt.text(0.5, 0.85, "mean = " __?__, horizontalalignment='center',
         verticalalignment='center',
         transform=ax.transAxes)
plt.text(0.5, 0.8, "stdv = " __?__, horizontalalignment='center',
         verticalalignment='center',
         transform=ax.transAxes)
plt.text(0.5, 0.75, "n = " __?__, horizontalalignment='center',
         verticalalignment='center',
         transform=ax.transAxes)

but I don't know what I would put in place of the question marks. I haven't had much luck finding anything on Stack Overflow.

CodePudding user response：

In your stats_all dataframe, it's already arranged nicely where the statistical measures are the rows and the sites are columns. Since you're looping through the sites (columns of pmf) and creating a plot for each, you just have to use that same loop variable to reference the correct column in stats_all.

Here's one solution that uses pandas loc to do a row-column-lookup and Python f-strings to help construct the label. I made a basic example that you could adapt to your own code with dummy versions of pmf and stats_all:

>>> pmf = pd.DataFrame({'site1': [1, 2, 3], 'site2':[1, 2, 3]})

   site1  site2
0      1      1
1      2      2
2      3      3

>>> stats_all = pd.DataFrame({'site1': [7, 8, 9], 'site2': [10, 11, 12]}, index=['mean', 'n', 'std'])

      site1  site2
mean      7     10
n         8     11
std       9     12

You'd use stats_all.loc['row name', 'column name'] to access the value for a given statistical measure and site. Then you use f-strings to create the plot annotation using said value.

# loop through columns of pmf (the sites)
for site in pmf.columns:

    # create figure and plot histogram from pmf
    plt.figure()
    sns.histplot(pmf[site])

    # use loc and square brackets to get the right
    # statistical measure for the current site
    label_mean = stats_all.loc['mean', site]
    label_n = stats_all.loc['n', site]
    label_std = stats_all.loc['std', site]

    # use f-strings to construct the text label which
    # reference the variables we created above
    plt.text(2, 0.2, f'mean = {label_mean}')
    plt.text(2, 0.4, f'n = {label_n}')
    plt.text(2, 0.6, f'std = {label_std}')

Which results in the following two images:

annotated histogram of site1

annotated histogram of site2

EDIT: to set a fixed point for the text on the figure, you need to convert coordinates to pixel coordinates, which is explained in this post setting a fixed position for matplotlib text