I have a dataframe :
-- ----
|id|cluster|
-- ----
| 12345|A|
| 67890|A|
| 45678|A|
| 98765|A|
| 12321|A|
| 98789|A|
-- ----
How can i use pandas to return each value as a recommendation for each id? desired output:
-- ------- --------------
|id|cluster|recommendation|
-- ------- --------------
| 12345|A|67890
| 12345|A|45678
| 12345|A|98765
| 12345|A|12321
| 12345|A|98789
| 67890|A|12345
| 67890|A|45678
| 67890|A|98765
| 67890|A|12321
| 67890|A|98789
...
| 98789|A|12321
-- ------- --------------
CodePudding user response:
You could add a column of lists and explode
:
df['recommendation'] = pd.Series([list(df['id'])]*len(df))
df = df.explode('recommendation').query('id != recommendation')
NB. add reset_index(drop=True)
if needed to have a non duplicate index
output:
id cluster recommendation
0 12345 A 67890
0 12345 A 45678
0 12345 A 98765
0 12345 A 12321
0 12345 A 98789
1 67890 A 12345
...
4 12321 A 98789
5 98789 A 12345
5 98789 A 67890
5 98789 A 45678
5 98789 A 98765
5 98789 A 12321
CodePudding user response:
You can use a cross join with df
and id
column of df
using .merge()
. Filter out same id
and rename the column labels, as follows:
(df.merge(df[['id']].rename({'id': 'recommendation'}, axis=1), how='cross')
.query('id != recommendation')
)
or, if your Pandas version is older than 1.2.0 (released on December 2020) and does not support cross join (merge with how='cross'
), you can use:
(df.assign(key=1).merge(df[['id']].rename({'id': 'recommendation'}, axis=1).assign(key=1), on='key').drop('key', axis=1)
.query('id != recommendation')
)
Result:
id cluster recommendation
1 12345 A 67890
2 12345 A 45678
3 12345 A 98765
4 12345 A 12321
5 12345 A 98789
6 67890 A 12345
8 67890 A 45678
9 67890 A 98765
10 67890 A 12321
11 67890 A 98789
12 45678 A 12345
13 45678 A 67890
15 45678 A 98765
16 45678 A 12321
17 45678 A 98789
18 98765 A 12345
19 98765 A 67890
20 98765 A 45678
22 98765 A 12321
23 98765 A 98789
24 12321 A 12345
25 12321 A 67890
26 12321 A 45678
27 12321 A 98765
29 12321 A 98789
30 98789 A 12345
31 98789 A 67890
32 98789 A 45678
33 98789 A 98765
34 98789 A 12321