Home > Mobile >  Column combinations using Pyspark or Pandas based on conditions
Column combinations using Pyspark or Pandas based on conditions

Time:10-01

I want to generate combinations for the following 3 lists such that beds >= baths >= cars, and the combinations are then projected into 3 dataframe columns. Could anyone suggest a method to achieve this? Thanks!

beds = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]
baths = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]
cars = [0, 1, 2, 3, 4, 5, 6, 7, 8]

Example

    nbed  nbath  ncar
0    1.0    1.0     0
1    1.0    1.0     1
2    2.0    1.0     0
3    2.0    2.0     0
4    2.0    1.0     1
5    2.0    2.0     1
6    2.0    1.0     2
7    2.0    2.0     2
8    3.0    1.0     0
9    3.0    2.0     0
10   3.0    3.0     0
11   3.0    1.0     1
12   3.0    2.0     1
13   3.0    3.0     1
14   3.0    1.0     2
15   3.0    2.0     2
16   3.0    3.0     2
17   3.0    1.0     3
18   3.0    2.0     3
19   3.0    3.0     3

CodePudding user response:

You can do a simple join and then query:

df1,df2,df3 = [pd.DataFrame({name:lst} ) for lst,name in zip([beds,baths,cars], ['bed', 'bath', 'beyond'])]

df_123 = df1.join(df2, how='cross').join(df3, how='cross')

df_123.query("bed >= bath >= beyond").reset_index(drop=True)
#     bed  bath  beyond
#0    1.0   1.0       0
#1    1.0   1.0       1
#2    2.0   1.0       0
#3    2.0   1.0       1
#4    2.0   2.0       0
#..   ...   ...     ...
#151  8.0   8.0       4
#152  8.0   8.0       5
#153  8.0   8.0       6
#154  8.0   8.0       7
#155  8.0   8.0       8


CodePudding user response:

One option is with conditional_join from pyjanitor and it is usually more efficient than a cartesian join for such inequality (non-equi) joins:

# pip install pyjanitor
import janitor
import pandas as pd

# convert them to integer type, 
# to make all data types uniform
beds = pd.DataFrame({'beds': beds}, dtype = int)
cars = pd.Series(cars, name = 'cars', dtype = int)
baths = pd.Series(baths, name = 'baths', dtype = int)

(beds
.conditional_join(
    baths, 
    ("beds", "baths", ">="))
.conditional_join(
    cars, 
    ("baths", "cars", ">="))
)

     beds  baths  cars
0       1      1     0
1       1      1     1
2       2      1     0
3       2      1     1
4       2      2     0
..    ...    ...   ...
151     8      8     4
152     8      8     5
153     8      8     6
154     8      8     7
155     8      8     8

[156 rows x 3 columns]
  • Related