In Pandas I have a dataframe like below
data= [['A','B',3],['A','C',4],['A','D',5],['B','A',4],['B','C',4],['C','D',1]]
df = pd.DataFrame(data,columns =['Col1','Col2','Value'])
df
Col1 | Col2 | Value |
---|---|---|
A | B | 3 |
A | C | 4 |
A | D | 5 |
B | C | 4 |
C | D | 1 |
B | A | 4 |
I want to convert it as below
A:B | A:C | A:D | B:C | C:D |
---|---|---|---|---|
7 | 4 | `5 | 4 | 1 |
Note: first column A:B value is 7 because there exists combination (A:B) = 4 (B:A) = 3.
Please suggest a quick method
CodePudding user response:
Use sorted
with join
for both columns and aggregate sum
, last transpose:
df1 = (df.groupby(df[['Col1','Col2']]
.agg(lambda x: ':'.join(sorted(x)), axis=1))
.sum()
.T
.reset_index(drop=True))
print (df1)
A:B A:C A:D B:C C:D
0 7 4 5 4 1
CodePudding user response:
You can do:
df.groupby([':'.join(sorted(t)) for t in zip(df['Col1'], df['Col2'
])])['Value'].sum().to_frame().T
output:
A:B A:C A:D B:C C:D
Value 7 4 5 4 1
CodePudding user response:
you can use sets
to make order irrelevant or sorted
to make sure you have consistent ordering
result = df['Value'].\
groupby(df[['Col1','Col2']].\
apply(set,axis=1).apply(':'.join)).sum()
print(result['B:A'])
print(result)
but that gets you a series ...
B:A 7
C:A 4
C:B 4
D:A 5
D:C 1
Name: Value, dtype: int64
if you want it the other way you need to make 2 small changes
result = df[['Value']].\
groupby(df[['Col1','Col2']].\
apply(set,axis=1).apply(':'.join)).sum().T
print(result)
print(result['B:A']['Value'])
note that set does not care about order so it might be A:B
or B:A
if you need it actually sorted to always be 'A:B'
(you should have specified that as part of your requirements) you will need to use sort
instead of set
as shown below
result = df['Value'].\
groupby(df[['Col1','Col2']].\
apply(sorted,axis=1).apply(':'.join)).sum()
print(result['A:B']) # always sorted...
print(result)