Home > Enterprise >  How to dynamically generate Pipes in Python Pandas?
How to dynamically generate Pipes in Python Pandas?

Time:11-21

I'm building a data augmentation pipeline using dataframes. I've created a function, h3_int, that takes an int input and appends a column of hex values to the dataframe. Here's the implementation of h3_int:

from h3.unstable import vect
def h3_int(df, level):
    df['h3_'   str(level)] = vect.geo_to_h3(df.lat.values, df.lng.values, level).tolist()
    return df

df is comprised of a lng and lat column:

    lat lng
0   43.64617    -79.42451
1   43.64105    -79.37628
2   43.66724    -79.41598
3   43.69602    -79.45468
4   43.66890    -79.32592
... ... ...
9515    36.10644    -115.16711
9516    36.00814    -115.17496
9517    36.10711    -115.16607
9518    36.03119    -115.05352
9519    36.13554    -115.11541

Simple usage of h3_int:

df.pipe(h3_int, 8)

Since the input is dynamic, I'd like to dynamically generate the pipes as well, but I've been having difficulty implementing this.

The code,

(df.pipe(h3_int, i) for i in range(8, 10))

returns:

<generator object <genexpr> at 0x7fd4858557b0>

While,

(df.pipe((h3_int, i) for i in range(8, 10)))

raises an exception:

TypeError: 'generator' object is not callable

What's the correct method for implementing dynamic pipes in pandas? Unfortunately I've found the documentation and Stack Overflow lacking in answers.

CodePudding user response:

Using list comprehension inside of parentheses returns a generator, which is not indexable, as the error message indicates. Instead, you can use square brackets to create a list, which is indexable:

>>> [df.pipe(h3_int, i) for i in range(8, 9)][0]
        lat        lng                h3_8
0  43.64617  -79.42451  613256717813153791
1  43.64105  -79.37628  613256717559398399
2  43.66724  -79.41598  613256718316470271
3  43.69602  -79.45468  613256716607291391
4  43.66890  -79.32592  613256718037549055
5  36.10644 -115.16711  613220086766895103
6  36.00814 -115.17496  613220073288499199
7  36.10711 -115.16607  613220086766895103
8  36.03119 -115.05352  613220075656183807
9  36.13554 -115.11541  613220087052107775

Note that df was modified in place because your function h3_int does not copy it before modifying it. That's not bad, it's just something to keep in mind.

  • Related