How do I discretize a continuous function avoiding noise generation (see picture)-CodePudding

I have a continuous input function which I would like to discretize into lets say 5-10 discrete bins between 1 and 0. Right now I am using np.digitize and rescale the output bins to 0-1. Now the problem is that sometime datasets (blue line) yield results like this:

I tried pushing up the number of discretization bins but I ended up keeping the same noise and getting just more increments. As an example where the algorithm worked with the same settings but another dataset:

this is the code I used there NumOfDisc = number of bins

intervals = np.linspace(0,1,NumOfDisc)
discretized_Array = np.digitize(Continuous_Array, intervals)

The red ilne in the graph is not important. The continuous blue line is the on I try to discretize and the green line is the discretized result.The Graphs are created with matplotlyib.pyplot using the following code:

def CheckPlots(discretized_Array, Continuous_Array, Temperature, time, PlotName)
logging.info("Plotting...")

#Setting Axis properties and titles
fig, ax = plt.subplots(1, 1)
ax.set_title(PlotName)
ax.set_ylabel('Temperature [°C]')
ax.set_ylim(40, 110)
ax.set_xlabel('Time [s]')    
ax.grid(b=True, which="both")
ax2=ax.twinx()
ax2.set_ylabel('DC Power [%]')
ax2.set_ylim(-1.5,3.5)

#Plotting stuff
ax.plot(time, Temperature, label= "Input Temperature", color = '#c70e04')
ax2.plot(time, Continuous_Array, label= "Continuous Power", color = '#040ec7')
ax2.plot(time, discretized_Array, label= "Discrete Power", color = '#539600')

fig.legend(loc = "upper left", bbox_to_anchor=(0,1), bbox_transform=ax.transAxes)

logging.info("Done!")
logging.info("---")
return

Any Ideas what I could do to get sensible discretizations like in the second case?

CodePudding user response：

If what I described in the comments is the problem, there are a few options to deal with this:

Do nothing: Depending on the reason you're discretizing, you might want the discrete values to reflect the continuous values accurately
Change the bins: you could shift the bins or change the number of bins, such that relatively 'flat' parts of the blue line stay within one bin, thus giving a flat green line in these parts as well, which would be visually more pleasing like in your second plot.

CodePudding user response：

The following solution gives the exact result you need.

Basically, the algorithm finds an ideal line, and attempts to replicate it as well as it can with less datapoints. It starts with 2 points at the edges (straight line), then adds one in the center, then checks which side has the greatest error, and adds a point in the center of that, and so on, until it reaches the desired bin count. Simple :)

import warnings
warnings.simplefilter('ignore', np.RankWarning)


def line_error(x0, y0, x1, y1, ideal_line, integral_points=100):
    """Assume a straight line between (x0,y0)->(x1,p1). Then sample the perfect line multiple times and compute the distance."""
    straight_line = np.poly1d(np.polyfit([x0, x1], [y0, y1], 1))
    xs = np.linspace(x0, x1, num=integral_points)
    ys = straight_line(xs)

    perfect_ys = ideal_line(xs)
    
    err = np.abs(ys - perfect_ys).sum() / integral_points * (x1 - x0)  # Remove (x1 - x0) to only look at avg errors
    return err


def discretize_bisect(xs, ys, bin_count):
    """Returns xs and ys of discrete points"""
    # For a large number of datapoints, without loss of generality you can treat xs and ys as bin edges
    # If it gives bad results, you can edges in many ways, e.g. with np.polyline or np.histogram_bin_edges
    ideal_line = np.poly1d(np.polyfit(xs, ys, 50))
    
    new_xs = [xs[0], xs[-1]]
    new_ys = [ys[0], ys[-1]]
    
    while len(new_xs) < bin_count:
        
        errors = []
        for i in range(len(new_xs)-1):
            err = line_error(new_xs[i], new_ys[i], new_xs[i 1], new_ys[i 1], ideal_line)
            errors.append(err)

        max_segment_id = np.argmax(errors)
        new_x = (new_xs[max_segment_id]   new_xs[max_segment_id 1]) / 2
        new_y = ideal_line(new_x)
        new_xs.insert(max_segment_id 1, new_x)
        new_ys.insert(max_segment_id 1, new_y)

    return new_xs, new_ys


BIN_COUNT = 25

new_xs, new_ys = discretize_bisect(xs, ys, BIN_COUNT)

plot_graph(xs, ys, new_xs, new_ys, f"Discretized and Continuous comparison, N(cont) = {N_MOCK}, N(disc) = {BIN_COUNT}")
print("Bin count:", len(new_xs))

Moreover, here's my simplified plotting function I tested with.

def plot_graph(cont_time, cont_array, disc_time, disc_array, plot_name):
    """A simplified version of the provided plotting function"""
    
    # Setting Axis properties and titles
    fig, ax = plt.subplots(figsize=(20, 4))
    ax.set_title(plot_name)
    ax.set_xlabel('Time [s]')
    ax.set_ylabel('DC Power [%]')

    # Plotting stuff
    ax.plot(cont_time, cont_array, label="Continuous Power", color='#0000ff')
    ax.plot(disc_time, disc_array, label="Discrete Power",   color='#00ff00')

    fig.legend(loc="upper left", bbox_to_anchor=(0,1), bbox_transform=ax.transAxes)

Lastly, here's the Google Colab