I'm building a data augmentation pipeline using dataframes. I've created a function, h3_int
, that takes an int input and appends a column of hex values to the dataframe. Here's the implementation of h3_int
:
from h3.unstable import vect
def h3_int(df, level):
df['h3_' str(level)] = vect.geo_to_h3(df.lat.values, df.lng.values, level).tolist()
return df
df
is comprised of a lng
and lat
column:
lat lng
0 43.64617 -79.42451
1 43.64105 -79.37628
2 43.66724 -79.41598
3 43.69602 -79.45468
4 43.66890 -79.32592
... ... ...
9515 36.10644 -115.16711
9516 36.00814 -115.17496
9517 36.10711 -115.16607
9518 36.03119 -115.05352
9519 36.13554 -115.11541
Simple usage of h3_int
:
df.pipe(h3_int, 8)
Since the input is dynamic, I'd like to dynamically generate the pipes as well, but I've been having difficulty implementing this.
The code,
(df.pipe(h3_int, i) for i in range(8, 10))
returns:
<generator object <genexpr> at 0x7fd4858557b0>
While,
(df.pipe((h3_int, i) for i in range(8, 10)))
raises an exception:
TypeError: 'generator' object is not callable
What's the correct method for implementing dynamic pipes in pandas? Unfortunately I've found the documentation and Stack Overflow lacking in answers.
CodePudding user response:
Using list comprehension inside of parentheses returns a generator
, which is not indexable, as the error message indicates. Instead, you can use square brackets to create a list, which is indexable:
>>> [df.pipe(h3_int, i) for i in range(8, 9)][0]
lat lng h3_8
0 43.64617 -79.42451 613256717813153791
1 43.64105 -79.37628 613256717559398399
2 43.66724 -79.41598 613256718316470271
3 43.69602 -79.45468 613256716607291391
4 43.66890 -79.32592 613256718037549055
5 36.10644 -115.16711 613220086766895103
6 36.00814 -115.17496 613220073288499199
7 36.10711 -115.16607 613220086766895103
8 36.03119 -115.05352 613220075656183807
9 36.13554 -115.11541 613220087052107775
Note that df
was modified in place because your function h3_int
does not copy it before modifying it. That's not bad, it's just something to keep in mind.