COL_A | COL_B | COL_C |
---|---|---|
PRODUCT_1 | UK | 1/1/2021 |
I want to have a table that contains the result
COL_A | COL_B | COL_C | COL_A_COL_B_COL_C | COL_A_COL_B | COL_A_COL_C | COL_B_COL_C |
---|---|---|---|---|---|---|
PRODUCT_1 | UK | 1/1/2021 | PRODUCT_1UK1/1/2021 | PRODUCT_1UK | PRODUCT_11/1/2021 | UK1/1/2021 |
Basically, combinations concatenated from left to right and place them as new columns in the dataset.
I don't need all of the combinations, so for example, if ABC columns are concatenated, this is enough on that combination (BCA, or ACB are not needed).
How can you do that in Python3 with itertools and pandas? The real table contains more than 10 columns to get their combinations.
CodePudding user response:
You can use pandas.concat
:
from itertools import combinations
df.join(pd.concat({a '_' b: df[a] df[b] for a,b in combinations(df, 2)}, axis=1))
output:
COL_A COL_B COL_C COL_A_COL_B COL_A_COL_C COL_B_COL_C
0 PRODUCT_1 UK 1/1/2021 PRODUCT_1UK PRODUCT_11/1/2021 UK1/1/2021
Many combinations:
from itertools import combinations
df.join(pd.concat({'_'.join(x): df[x[0]].str.cat(df[list(x[1:])].astype(str),
sep='')
for i in (2, 3)
for x in combinations(df, i)}, axis=1))
output:
COL_A COL_B COL_C COL_A_COL_B COL_A_COL_C COL_B_COL_C \
0 PRODUCT_1 UK 1/1/2021 PRODUCT_1UK PRODUCT_11/1/2021 UK1/1/2021
COL_A_COL_B_COL_C
0 PRODUCT_1UK1/1/2021