Home > Mobile >  How to plot multiple lines from a dataframe
How to plot multiple lines from a dataframe

Time:10-25

I have the following data:

import pandas as pd

# using the data dict at the bottom of the question
df_uplift_percentile = pd.DataFrame.from_dict(data, 'index')
df_uplift_percentile.index.name = 'percentile'

# display(df_uplift_percentile)
            n_treatment  n_control  response_rate_treatment  response_rate_control    uplift  std_treatment  std_control  std_uplift
percentile                                                                                                                          
0-10                217        983                 0.041475               0.004069  0.037405       0.013535     0.002030    0.013687
10-20               145       1055                 0.013793               0.000948  0.012845       0.009686     0.000947    0.009732
20-30               149       1051                 0.000000               0.000000  0.000000       0.000000     0.000000    0.000000
30-40               383        817                 0.010444               0.009792  0.000652       0.005195     0.003445    0.006233
40-50               354        846                 0.005650               0.005910 -0.000260       0.003984     0.002635    0.004776
50-60               423        777                 0.033097               0.029601  0.003496       0.008698     0.006080    0.010612
60-70               588        611                 0.132653               0.155483 -0.022830       0.013988     0.014660    0.020263
70-80               673        526                 0.178306               0.161597  0.016709       0.014755     0.016049    0.021801
80-90               881        318                 0.155505               0.261006 -0.105501       0.012209     0.024628    0.027488
90-100              938        261                 0.152452               0.333333 -0.180881       0.011737     0.029179    0.031451

I want to plot response_rate_treatment, response_rate_control, uplift by percentile (x axis) via a line chart with different color.

I am trying the below code. What mistake am I making that it is plotting a lot of charts instead of just 3 lines.

plt.figure(figsize=(20,15))


percentile = df_uplift_percentile.values

response_rate_treatment = df_uplift_percentile["response_rate_treatment"].values

response_rate_control = df_uplift_percentile["response_rate_control"].values

uplift= df_uplift_percentile["uplift"].values

plt.plot(percentile,response_rate_treatment,label= "Treatment Response Rate", color = 'green' )
plt.plot(percentile,response_rate_control,label = "Control Response Rate", color = 'yellow' )
plt.plot(percentile,uplift,label = "Uplift", color = 'red' )

plt.legend()
plt.ylabel("Uplift = Treatment Response Rate- Control Response Rate")

Current Plot Result

enter image description here

Reproducible Data

  • Data dict
data =\
{'0-10': {'n_treatment': 217,
  'n_control': 983,
  'response_rate_treatment': 0.041475,
  'response_rate_control': 0.004069,
  'uplift': 0.037405,
  'std_treatment': 0.013535,
  'std_control': 0.00203,
  'std_uplift': 0.013687},
 '10-20': {'n_treatment': 145,
  'n_control': 1055,
  'response_rate_treatment': 0.013793,
  'response_rate_control': 0.000948,
  'uplift': 0.012845,
  'std_treatment': 0.009686,
  'std_control': 0.000947,
  'std_uplift': 0.009732},
 '20-30': {'n_treatment': 149,
  'n_control': 1051,
  'response_rate_treatment': 0.0,
  'response_rate_control': 0.0,
  'uplift': 0.0,
  'std_treatment': 0.0,
  'std_control': 0.0,
  'std_uplift': 0.0},
 '30-40': {'n_treatment': 383,
  'n_control': 817,
  'response_rate_treatment': 0.010444,
  'response_rate_control': 0.009792,
  'uplift': 0.000652,
  'std_treatment': 0.005195,
  'std_control': 0.003445,
  'std_uplift': 0.006233},
 '40-50': {'n_treatment': 354,
  'n_control': 846,
  'response_rate_treatment': 0.00565,
  'response_rate_control': 0.00591,
  'uplift': -0.00026,
  'std_treatment': 0.003984,
  'std_control': 0.002635,
  'std_uplift': 0.004776},
 '50-60': {'n_treatment': 423,
  'n_control': 777,
  'response_rate_treatment': 0.033097,
  'response_rate_control': 0.029601,
  'uplift': 0.003496,
  'std_treatment': 0.008698,
  'std_control': 0.00608,
  'std_uplift': 0.010612},
 '60-70': {'n_treatment': 588,
  'n_control': 611,
  'response_rate_treatment': 0.132653,
  'response_rate_control': 0.155483,
  'uplift': -0.02283,
  'std_treatment': 0.013988,
  'std_control': 0.01466,
  'std_uplift': 0.020263},
 '70-80': {'n_treatment': 673,
  'n_control': 526,
  'response_rate_treatment': 0.178306,
  'response_rate_control': 0.161597,
  'uplift': 0.016709,
  'std_treatment': 0.014755,
  'std_control': 0.016049,
  'std_uplift': 0.021801},
 '80-90': {'n_treatment': 881,
  'n_control': 318,
  'response_rate_treatment': 0.155505,
  'response_rate_control': 0.261006,
  'uplift': -0.105501,
  'std_treatment': 0.012209,
  'std_control': 0.024628,
  'std_uplift': 0.027488},
 '90-100': {'n_treatment': 938,
  'n_control': 261,
  'response_rate_treatment': 0.152452,
  'response_rate_control': 0.333333,
  'uplift': -0.180881,
  'std_treatment': 0.011737,
  'std_control': 0.029179,
  'std_uplift': 0.031451}}

CodePudding user response:

  • The correct way to plot many columns as lines, is to use enter image description here

    CodePudding user response:

    Use:

    percentile = df_uplift_percentile.index
    
    

    instead of

    percentile = df_uplift_percentile.values
    
  • Related