I can best explain my problem by starting with an example:
df = pd.DataFrame({"ID" : [1, 2, 3, 4],
"age": [46, 48, 55, 55],
"gender": ['female', 'female', 'male', 'male'],
"overweight": ['y', 'n', 'y', 'y']},
index = [0, 1, 2, 3])
Now I want to build a function that recives a dataframe (= df) and an integer (= m).
For example m = 2, now the function should combine every column designations in pairs of two. The output should be a list containing those pairs. For example m=2 und df:
[[ID, age],[ID, gender],[ID, overweight],[age, gender], [age, overweight], [gender, overweight]]
Does anyone knwo how I can achieve that? My problem is that m and the amount of columns are variable...
Thank you in advance
CodePudding user response:
You can use itertools.combinations
directly on the dataframe as iteration occurs on the column names:
from itertools import combinations
m = 2
out = list(combinations(df, m))
output:
[('ID', 'age'),
('ID', 'gender'),
('ID', 'overweight'),
('age', 'gender'),
('age', 'overweight'),
('gender', 'overweight')]