Home > Blockchain >  How to plot distributions for several bivariate groups of variable using Python
How to plot distributions for several bivariate groups of variable using Python

Time:01-04

I am analysing data which is organised as following:

  • There are 4 different pandas data fram for each groups (A, B and C).
  • Each dataframe representing a group has 4 subroups (columns) and rows representing thoer corresponding observations.

For example, a single group of data looks like:

subgroup-1 subgroup-2 subgroup-3 subgroup-4
12 4 NaN 9
15 3 4 NaN
16 8 3 11
17 12 8 13
11 17 12 14

I want to visualise the distributions for each subgroup for the different group. Can anyone let me know what are the available options in Python to do this (the chart types I can use). Thanks.

I tried using histogram, density plots but all of them work only for 2 variables.

CodePudding user response:

import pandas as pd  
import numpy as np
import matplotlib.pyplot as plt


# pandas Dataframes
group_A = pd.DataFrame(np.random.rand(50, 4) , columns=['subgroup-1' , 'subgroup-2' , 'subgroup-3' , 'subgroup-4'])  
group_B = pd.DataFrame(np.random.rand(50, 4) , columns=['subgroup-1' , 'subgroup-2' , 'subgroup-3' , 'subgroup-4'])  
group_C = pd.DataFrame(np.random.rand(50, 4) , columns=['subgroup-1' , 'subgroup-2' , 'subgroup-3' , 'subgroup-4'])  

  



def plot_hist(subgroup):
    np.random.seed(19680801)

    n_bins = 10

    x = np.dstack([group_A[subgroup] , group_B[subgroup] , group_C[subgroup]])[0]


    fig, axes = plt.subplots(nrows=2, ncols=2)
    ax0, ax1, ax2, ax3 = axes.flatten()

    ax0.hist(x, n_bins, density=True, histtype='bar', label = ['A', 'B', 'C'])
    ax0.legend(prop={'size': 10})
    ax0.set_title('bars with legend')

    ax1.hist(x, n_bins, density=True, histtype='bar', stacked=True)
    ax1.set_title('stacked bar')

    ax2.hist(x, n_bins, histtype='step', stacked=True, fill=False)
    ax2.set_title('stack step (unfilled)')

    # Make a multiple-histogram of data-sets with different length.
    x_multi = [np.random.randn(n) for n in [10000, 5000, 2000]]
    ax3.hist(x_multi, n_bins, histtype='bar')
    ax3.set_title('different sample sizes')

    fig.tight_layout()
    plt.show()






plot_hist('subgroup-1')

enter image description here

  • Related