Multi-index for looping sub plotting data in python-CodePudding

I am trying to compare the yearly P/E of select stocks with their industry mean (P/E_y).

I feel like the best way to analyze this is to visualize the data with line subplots or individual line graphs.

Original DataFrame:

Input:
stock_merge = pd.read_csv("industry mean.csv")
stock_merge

Output:
    Ticker  Year    Industry            P/E_x       P/E_y
0   NVDA    2019    Semiconductors      20.616292   15.79
1   NVDA    2020    Semiconductors      53.349938   15.79
2   NVDA    2021    Semiconductors      76.028282   15.79
3   NVDA    2022    Semiconductors      62.528408   15.79
4   AVGO    2018    Semiconductors      6.287096    15.79
5   AVGO    2019    Semiconductors      40.731857   15.79
6   AVGO    2020    Semiconductors      45.212246   15.79
7   AVGO    2021    Semiconductors      30.819690   15.79
...                ...              ...         ...
400 EFX     2018    Consulting Services 35.487911   35.56
401 EFX     2019    Consulting Services -43.694808  35.56
402 EFX     2020    Consulting Services 44.853370   35.56
403 EFX     2021    Consulting Services 47.910847   35.56

I tried using .groupby() to loop into each industry from the dataframe and then plotting.

all = stock_merge.groupby(['Industry', 'Ticker', 'Year']).mean()
all

Output

                                            P/E_x       P/E_y
Industry                 Ticker   Year      
Aerospace & Defense      BA       2018      17.606935   26.44
                                  2019     -299.239806  26.44
                                  2020     -10.595709   26.44
                                  2021     -28.156965   26.44
                         HII      2018      8.511068    26.44
...                     ...                 ...         ...

Travel Services          RCL      2021     -3.724618    62.84
Trucking                 ODFL     2018     15.484180    12.30
                                  2019     23.518331    12.30
                                  2020     33.315306    12.30
                                  2021     39.847872    12.30

This is what I tried:

all_industries = all['Industry'].unique()
feature = enumerate(all_industries)

plt.figure(figsize = (30,20))

for i in enumerate(feature):

  plt.subplot(6, 3, i[0] 1)
  sns.lineplot(x='P/E_y', y=i[1], hue = 'Ticker', data=all)

I received nothing but errors and empty subplots.

This is what I'm trying to achieve:

There should be a new plot for each industry (preferably subplots)
In that plot, should contain each ticker, years, and P/E for that industry

For example, if there are 5 stocks in the Semiconductors industry, that line graph should show 6 lines: 5 lines for every stock in Semiconductors, P/E_x between 2018-2022 and 1 line for P/E_y.

How do I plot this data?

CodePudding user response：

I think you are going down the right track by using .groupby().

You may find it easier if you iterate through the grouped object using the .groups property, which returns a dictionary of groups such as

{'Consulting Services': [8, 9], 'Semiconductors': [0, 1, 2, 3, 4, 5, 6, 7]}

By iterating over the keys of this dictionary, I think you will be able to achieve what you want. In each loop you can further divide the group into your individual lines. Hopefully the following snippet will help to put you on the right track:

stock_merge = pd.read_csv("data.csv")
industries = stock_merge.groupby("Industry")

for k in industries.groups:
    industry = industries.get_group(k)
    p_ey = industry["P/E_y"].iat[0]
    print(k, p_ey)
    # Create subplot, add P/E_y line 
    tickers = industry.groupby("Ticker")
    for t in tickers.groups:
        ticker_data = tickers.get_group(t)
        # Use ticker_data to generate an individual line on the plot
        display(ticker_data)