SGD on a Piecewise non-differentiable function-CodePudding

Setup for the problem:

I have a canvas which represents a city, each second I add a new resident to the city. The each resident has a job with a location that is randomly sampled from a distribution. Each resident also has a custom cost function that helps them decide where they want to live which they do by minimizing this cost function with respect to two variables x and y. So the function for example looks something like:

cost(x,y) = distance_to_job(x,y)   distance_to_center_of_city(x,y)   population_density(x,y)

where population_density(x,y) is just the population density at point (x,y). Naturally population_density(x,y) (without any transformations) is a piecewise non-differentiable function as one has to define a grid of blocks in the city and keep track of how many people per grid unit there is (think of a population density map of the world, each country has a distinct value that isn't necessarily the same as its neighbor, so if you were to map this on a 3-D plot, the function that you map would not be smooth).

Let me know if this setup is confusing, I'll try to make it a bit more clear.

The Question:

One could define a transformation where between each grid cell you designate a steep but smooth transition between the values of the piecewise function but as of now my population density function is not smooth and not differentiable at each boundary between grid cells. At first I did not think that SGD optimization in tensorflow would not work as I don't have a differentiable cost function but it seems to run fine. I am confused about what exactly is going on here and would love any clarification about how SGD optimization works and if my code is doing what I want it to.

Relevant Code:

def concentrationLookup(self, x, y):
    r_index = int(x // (self.city.total_w / self.city.rows))
    c_index = int(y // (self.city.total_h / self.city.cols))
    return self.city.grid[r_index, c_index]

tf_jobCost = lambda x,y: (0.1/travelCost) * (tf.pow(x - self.jobx, 2)   tf.pow(y - self.joby, 2))
tf_cityCost = lambda x,y: 0.01 * (tf.pow(x - self.city.centerX, 2))   0.01*(tf.pow(y - self.city.centerY, 2))
    

xVar = tf.Variable(locX)
yVar = tf.Variable(locY)
self.costfn = lambda: tf_jobCost(xVar, yVar)   tf_cityCost(xVar, yVar)   self.concentrationLookup(xVar, yVar)
opt = tf.keras.optimizers.SGD(learning_rate = 3.0)
for _ in range(100):
     opt.minimize(self.costfn, var_list = [xVar, yVar])
                
self.x = xVar.numpy()
self.y = yVar.numpy()

CodePudding user response：

I believe it's treating population_density(x,y) as a constant function of x,y. In other words, it doesn't contribute to the gradient, and doesn't contribute to the solution.

You can also verify this by zeroing out other components of the loss, and verifying that opt.minimize() fails with something like ValueError: No gradients provided for any variable....

I think the solution should be to forget that the function is piecewise constant and non-differentiable, and instead to treat it as piecewise linear instead. In that case, concentrationLookup(x,y) can be written as returning a bilinearly-weighted sum of points at the 4 neighboring pixels. Something like this:

def concentrationLookup(x, y):                                                  
  r = x / (total_w / rows) # no quantizing
  c = y / (total_h / cols) # no quantizing
  r1, c1 = int(r), int(c) # lower bounds                                                                 
  r2, c2 = r1   1, c1   1 # upper bounds
  w_r2, w_c2 = r - r1, c - c1
  w_r1, w_c1 = 1.0 - w_r2, 1.0 - w_c2
  # Assume constant boundary conditions.
  c2 = tf.clip_by_value(c2, 0, grid.shape[1]-1)
  c1 = tf.clip_by_value(c1, 0, grid.shape[1]-1)
  r2 = tf.clip_by_value(r2, 0, grid.shape[0]-1)
  r1 = tf.clip_by_value(r1, 0, grid.shape[0]-1)
  return w_r1*w_c1*grid[r1, c1]   w_r2*w_c2*grid[r2,c2]   w_r1*w_c2*grid[r1,c2]   w_r2*w_c1*grid[r2, c1]

In this case, the gradient seems to be well defined.