Home > other >  Variable combinations of column designations in pandas
Variable combinations of column designations in pandas

Time:06-04

I can best explain my problem by starting with an example:

df = pd.DataFrame({"ID" : [1, 2, 3, 4], 
                  "age": [46, 48, 55, 55],
                  "gender": ['female', 'female', 'male', 'male'],
                  "overweight": ['y', 'n', 'y', 'y']},
                  index = [0, 1, 2, 3])     
    

Now I want to build a function that recives a dataframe (= df) and an integer (= m). For example m = 2, now the function should combine every column designations in pairs of two. The output should be a list containing those pairs. For example m=2 und df: [[ID, age],[ID, gender],[ID, overweight],[age, gender], [age, overweight], [gender, overweight]]

Does anyone knwo how I can achieve that? My problem is that m and the amount of columns are variable...

Thank you in advance

CodePudding user response:

You can use itertools.combinations directly on the dataframe as iteration occurs on the column names:

from itertools import combinations

m = 2
out = list(combinations(df, m))

output:

[('ID', 'age'),
 ('ID', 'gender'),
 ('ID', 'overweight'),
 ('age', 'gender'),
 ('age', 'overweight'),
 ('gender', 'overweight')]
  • Related