Home > Mobile >  Create nested loop in a dynamic way in python
Create nested loop in a dynamic way in python

Time:07-20

I have a dataframe that looks like this: foo = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6], 'c': [6,7,8]}) and a list of columns list_of_columns = ['a','b'] of foo.

The list_of_columns are dynamically selected by the user, so it can be ['a','b'] but it can also be ['a','c'] or ['c'] or ['a','b','c'] etc

I would like for every column in the list_of_columns to create (nested) for loops and query the dataframe in the following way:

In case list_of_columns = ['a','b'] the the loop would be like this:

for a in foo.a.unique():
    for b in foo.b.unique():
        print(foo.query(f'a=={a} and b=={b}'))

In case list_of_columns = ['a'] the the loop would be like this:

for a in foo.a.unique():
    print(foo.query(f'a=={a}'))

In case list_of_columns = ['a','b','c'] the the loop would be like this:

for a in foo.a.unique():
    for b in foo.b.unique():
        for c in foo.c.unique():
            print(foo.query(f'a=={a} and b=={b} and c=={c}'))

Is there a way to programmatically achieve that in python ?

CodePudding user response:

One approach using itertools.product, to handle the "nested" loops:

import pandas as pd
from itertools import product

foo = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [6, 7, 8]})

list_of_columns = ['a', 'b']
for p in product(*(foo[col].unique() for col in list_of_columns)):
    query = " and ".join(f"{c}=={u}" for c, u in zip(list_of_columns, p))
    print(foo.query(query))
    print("--")

Output

   a  b  c
0  1  4  6
--
Empty DataFrame
Columns: [a, b, c]
Index: []
--
Empty DataFrame
Columns: [a, b, c]
Index: []
--
Empty DataFrame
Columns: [a, b, c]
Index: []
--
   a  b  c
1  2  5  7
--
Empty DataFrame
Columns: [a, b, c]
Index: []
--
Empty DataFrame
Columns: [a, b, c]
Index: []
--
Empty DataFrame
Columns: [a, b, c]
Index: []
--
   a  b  c
2  3  6  8
--

CodePudding user response:

Essentially, it looks like you want to loop over the unique combinations?

But as you then query for valid ones, you obtain a lot of empty DataFrames. If you do not need those, a much simpler and more efficient version would be:

for _,g in foo.groupby(list_of_columns):
    print('---')
    print(g)

output:

---
   a  b  c
0  1  4  6
---
   a  b  c
1  2  5  7
---
   a  b  c
2  3  6  8

In comparison, the output of your nested loop:

---
   a  b  c
0  1  4  6
---
Empty DataFrame
Columns: [a, b, c]
Index: []
---
Empty DataFrame
Columns: [a, b, c]
Index: []
---
Empty DataFrame
Columns: [a, b, c]
Index: []
---
   a  b  c
1  2  5  7
---
Empty DataFrame
Columns: [a, b, c]
Index: []
---
Empty DataFrame
Columns: [a, b, c]
Index: []
---
Empty DataFrame
Columns: [a, b, c]
Index: []
---
   a  b  c
2  3  6  8
  • Related