How to generate random and non duplicate points that fall within a polygon using Python, Numpy, and-CodePudding

I've been using Python, Shapely, Numpy and Geopandas to generate a dot density map. The dot density map points have different colors according to ethnicity, so it allows for understanding overall city demographics.

I've been using some code similar to the function

Here's the code I run to create the points:

pts_per_person = 5
  epsg = 4326 
  seed = 10

  list_of_point_categories = []
  for field in ['white_pop','black_pop','hispanic_pop', 'asian_pop', 'amerindian_pop',  'other_race_pop',   'two_or_more_races_pop']:
    
    ps = gpd.GeoDataFrame(gen_points_in_gdf_polys(geometry = gdf['geometry'], values=gdf[field],
                                points_per_value = pts_per_person, seed=seed))
    ps['ethnicity'] = field
    ps['year'] = i

    list_of_point_categories.append(ps)

  all_points=gpd.GeoDataFrame(pd.concat(list_of_point_categories))

  all_points=all_points.reset_index(drop=True)

Here are the functions:

def gen_random_points_poly(poly, num_points, seed = None):
    """
    Returns a list of N randomly generated points within a polygon. 
    """
    
    min_x, min_y, max_x, max_y = poly.bounds
    points = []
    i=0
    
    while len(points) < num_points:
        s=RandomState(seed i) if seed else RandomState(seed)
        random_point = Point([s.uniform(min_x, max_x), s.uniform(min_y, max_y)])
        if random_point.within(poly):
            points.append(random_point)
        i =1
    return points


def gen_points_in_gdf_polys(geometry, values, points_per_value = None, seed = None):
    """
    Take a GeoSeries of Polygons along with a Series of values and returns randomly generated points within
    these polygons. Optionally takes a "points_per_value" integer which indicates the number of points that 
    should be generated for each 1 value.
    """
    if points_per_value:
        new_values = (values/points_per_value).astype(int)
    else:
        new_values = values

    
    new_values = new_values[new_values>0]
    #print(new_values.size)
    
    if(new_values.size > 0):
        g = gpd.GeoDataFrame(data = {'vals':new_values}, geometry = geometry)
        
        a = g.apply(lambda row: tuple(gen_random_points_poly(row['geometry'], row['vals'], seed)),1)
        b = gpd.GeoSeries(a.apply(pd.Series).stack(), crs = geometry.crs)
        b.name='geometry'
      
        return b

But what I found is that I ended up with multiple duplicate points for each ethnicity. The latitude and longitude values were exactly the same.

Duplicate points were getting stacked on top of each other. I had to change the s.uniform line to random_point = Point([s.uniform(min_x, max_x) round(random.uniform(.0001, .001),10), s.uniform(min_y, max_y) round(random.uniform(.0001, .001),10)]) in order to make it truly random. This had the desired effect of spreading the points out more randomly, there were no duplicates.

But something about this feels a bit hacky, like I'm not using .uniform right. Is this the right way to create random points within a polygon?

CodePudding user response：

The problem seems to be in the first line of the while loop in the gen_random_points_poly() function. This line reads:

 s=RandomState(seed i) if seed else RandomState(seed)

This initializes the random numbers generator at each iteration of the loop. The initialization depends on the value of seed (which is fixed) and the value of i (which increments at each iteration of the loop). In effect, if you call the function gen_random_points_poly() multiple times, then the sequence of points it will generate will be exactly the same each time. This is why the collections of points generated for different ethnicities are exactly the same.

If you want to obtain reproducible results, you can either create the RandomState object once, outside the gen_random_points_poly() function. Alternatively, you can provide a different seed each time you call this function. In the latter case it is still more efficient to create the RandomState object just once, before the while loop instead of creating it repeatedly at each iteration.