Home > Software engineering >  How to plot curve for each row in dataframe without plotting NaN values? [python]
How to plot curve for each row in dataframe without plotting NaN values? [python]

Time:07-02

I have the following dataframe:

    0   5  15   20   25   30   35  40  45  50
----------------------------------------------
0  85  75  65   52   39   21   12   5   2   0 
1  80  69  52   48   21   12    5   2   0   0  
2  81  68  61   49   32   25   14   4   1   0
3  82  64  43   32   19    5    0   0   0   0
4  79  64  49   41   22    6    2   0   0   0

For context, each column, and the numerical title of each column, represents a distance away from a site in feet. And so for each row, I am measuring how a certain value decreases as distance away from a site increases. Each row will be a separate curve. And so I can plot each curve as a separate plot with matplotlib, thus producing 5 separate plots for each curve (row). That part is easy.

However, since I have 0s in each row, and in all but one case multiple 0s, the 0 points will be included in the plots. For the rows with multiple 0s, all 0s get recorded in the plots, and so I have a tail of 0s for these plots, with the x-axis going all the way out to 50 feet. This creates a differently shaped curve than if these plots just had one 0 value. For context, in my data/experiment, there is never any value increasing after 0 is hit, so the tail of additional 0s is not necessary, and only serves to give a troublesome-shaped curve (tail of 0s as opposed to a single 0). And so, the rows with multiple zeros as curves with tails of 0s produces a differently shaped curve than if only one 0 was included in those curves. What I am trying to do is find a way to get rid of those extra zeros in the rows, so that once the curve hits 0, the plot (the x-axis) ends.

The ultimate task I am trying to do here, for context, is to fit different curve equations to these curves. I want to use the curves with only the first hit of 0 included, and not the tail of multiple 0s. I have so far tried to classify the extra 0s as NaN, but when I try to plot this, I just get a ValueError: array must not contain infs or NaNs error.. How can I properly get rid of the extra 0s after the first 0 for each row so that the extra zeros are not represented in the curve plots?

CodePudding user response:

If you want to keep only the first 0 and not the trail of 0s, you can use numpy's function nonzero to find the index of nonzero values and trim the array to include those values plus the first 0. This works for plotting and data manipulation.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

df = pd.DataFrame({'0': [85, 80, 81, 82, 79],
                   '5': [75, 69, 68, 64, 64],
                   '15': [65, 52, 61, 43, 49],
                   '20': [52, 48, 49, 32, 41],
                   '25': [39, 21, 32, 19, 22],
                   '30': [21, 12, 25, 5, 6],
                   '35': [12, 5, 14, 0, 2],
                   '40': [5, 2, 4, 0, 0],
                   '45': [2, 0, 1, 0, 0],
                   '50': [0, 0, 0, 0, 0]})

# Initiate figure
fig, axs = plt.subplots(nrows=5, ncols=1, sharex='col')

for row in df.index:

    # Get array for plotting
    values = np.array(df.loc[row])
    make_x = np.array([0, 5, 15, 20, 25, 30, 35, 40, 45, 50])

    # Count number zeros values
    number_zeros = len(values) - np.count_nonzero(values)

    # There is more than one 0s - aka there is a tail of 0s in this array
    if number_zeros > 1:
        # Get index of last value that is not nonzero in the array
        end_idx = np.nonzero(values)[0][-1]   2
        # Note: the  2 is because you want the first 0 after the last nonzero value (so you add  1 to index count) but in python the last index is not included so you want the index after that so that the first 0 is included (another  1, therefore  2)

        # Trim the array to only include one zero
        values = values[0:end_idx]
        make_x = make_x[0:end_idx]

    axs[row].plot(make_x, values)

axs[0].set_ylim([0, 90])
plt.show()

And this is the result with only one 0 for each plot:

enter image description here

Hope this helps, cheers.

  • Related