I've been using Python, Shapely, Numpy and Geopandas to generate a dot density map. The dot density map points have different colors according to ethnicity, so it allows for understanding overall city demographics.
I've been using some code similar to the function
Here's the code I run to create the points:
pts_per_person = 5
epsg = 4326
seed = 10
list_of_point_categories = []
for field in ['white_pop','black_pop','hispanic_pop', 'asian_pop', 'amerindian_pop', 'other_race_pop', 'two_or_more_races_pop']:
ps = gpd.GeoDataFrame(gen_points_in_gdf_polys(geometry = gdf['geometry'], values=gdf[field],
points_per_value = pts_per_person, seed=seed))
ps['ethnicity'] = field
ps['year'] = i
list_of_point_categories.append(ps)
all_points=gpd.GeoDataFrame(pd.concat(list_of_point_categories))
all_points=all_points.reset_index(drop=True)
Here are the functions:
def gen_random_points_poly(poly, num_points, seed = None):
"""
Returns a list of N randomly generated points within a polygon.
"""
min_x, min_y, max_x, max_y = poly.bounds
points = []
i=0
while len(points) < num_points:
s=RandomState(seed i) if seed else RandomState(seed)
random_point = Point([s.uniform(min_x, max_x), s.uniform(min_y, max_y)])
if random_point.within(poly):
points.append(random_point)
i =1
return points
def gen_points_in_gdf_polys(geometry, values, points_per_value = None, seed = None):
"""
Take a GeoSeries of Polygons along with a Series of values and returns randomly generated points within
these polygons. Optionally takes a "points_per_value" integer which indicates the number of points that
should be generated for each 1 value.
"""
if points_per_value:
new_values = (values/points_per_value).astype(int)
else:
new_values = values
new_values = new_values[new_values>0]
#print(new_values.size)
if(new_values.size > 0):
g = gpd.GeoDataFrame(data = {'vals':new_values}, geometry = geometry)
a = g.apply(lambda row: tuple(gen_random_points_poly(row['geometry'], row['vals'], seed)),1)
b = gpd.GeoSeries(a.apply(pd.Series).stack(), crs = geometry.crs)
b.name='geometry'
return b
But what I found is that I ended up with multiple duplicate points for each ethnicity. The latitude and longitude values were exactly the same.
Duplicate points were getting stacked on top of each other. I had to change the s.uniform
line to random_point = Point([s.uniform(min_x, max_x) round(random.uniform(.0001, .001),10), s.uniform(min_y, max_y) round(random.uniform(.0001, .001),10)])
in order to make it truly random. This had the desired effect of spreading the points out more randomly, there were no duplicates.
But something about this feels a bit hacky, like I'm not using .uniform
right. Is this the right way to create random points within a polygon?
CodePudding user response:
The problem seems to be in the first line of the while
loop in the gen_random_points_poly()
function. This line reads:
s=RandomState(seed i) if seed else RandomState(seed)
This initializes the random numbers generator at each iteration of the loop. The initialization depends on the value of seed
(which is fixed) and the value of i
(which increments at each iteration of the loop). In effect, if you call the function gen_random_points_poly()
multiple times, then the sequence of points it will generate will be exactly the same each time. This is why the collections of points generated for different ethnicities are exactly the same.
If you want to obtain reproducible results, you can either create the RandomState
object once, outside the gen_random_points_poly()
function. Alternatively, you can provide a different seed each time you call this function. In the latter case it is still more efficient to create the RandomState
object just once, before the while
loop instead of creating it repeatedly at each iteration.