Really hard question to phrase.
df looks like this:
col1 col2 col3
123 1 0
456 0 1
789 1 0
234 0 1
trying to make a col4 that looks like this:
col1 col2 col3 col4
123 1 0 [123,789]
456 0 1 [456,234]
789 1 0 [123,789]
234 0 1 [456,234]
rows 1 and 3 are the same and rows 2 and 4 are the same
The code i've got is:
data = [
[123,1, 0]
, [456,0, 1]
, [789,1, 1]
, [234,0, 1]
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['col1', 'col2', 'col3'])
# Code
n = 2
cols = ['col2','col3']
combos = list(itertools.combinations(cols, n))
for combo in combos:
col1_list = df.groupby(combo).apply(lambda df: list(df['col1'].unique()))
col1_list
The error i get is:
KeyError: ('col2', 'col3')
CodePudding user response:
here is how you can do it :
# Code
n = 2
cols = ['col2','col3']
combos = list(itertools.combinations(cols, n))
for group in combos:
print(df.groupby(list(group))['col1'].apply(list))
output :
col2 col3
0 1 [456, 234]
1 0 [123, 789]
Name: col1, dtype: object