Home > Software engineering >  Converting row values into column name in pandas in a peculiar condition
Converting row values into column name in pandas in a peculiar condition

Time:09-28

In Pandas I have a dataframe like below

data= [['A','B',3],['A','C',4],['A','D',5],['B','A',4],['B','C',4],['C','D',1]]
df = pd.DataFrame(data,columns =['Col1','Col2','Value'])
df
Col1 Col2 Value
A B 3
A C 4
A D 5
B C 4
C D 1
B A 4

I want to convert it as below

A:B A:C A:D B:C C:D
7 4 `5 4 1

Note: first column A:B value is 7 because there exists combination (A:B) = 4 (B:A) = 3.

Please suggest a quick method

CodePudding user response:

Use sorted with join for both columns and aggregate sum, last transpose:

df1 = (df.groupby(df[['Col1','Col2']]
         .agg(lambda x: ':'.join(sorted(x)), axis=1))
         .sum()
         .T
         .reset_index(drop=True))
print (df1)
       A:B  A:C  A:D  B:C  C:D
0      7    4    5    4    1

CodePudding user response:

You can do:

df.groupby([':'.join(sorted(t)) for t in zip(df['Col1'], df['Col2'
           ])])['Value'].sum().to_frame().T

output:

       A:B  A:C  A:D  B:C  C:D
Value    7    4    5    4    1

CodePudding user response:

you can use sets to make order irrelevant or sorted to make sure you have consistent ordering

result = df['Value'].\           
          groupby(df[['Col1','Col2']].\
          apply(set,axis=1).apply(':'.join)).sum()

print(result['B:A'])
print(result)

but that gets you a series ...

B:A    7
C:A    4
C:B    4
D:A    5
D:C    1
Name: Value, dtype: int64

if you want it the other way you need to make 2 small changes

result = df[['Value']].\           
          groupby(df[['Col1','Col2']].\
          apply(set,axis=1).apply(':'.join)).sum().T
print(result)
print(result['B:A']['Value'])

note that set does not care about order so it might be A:B or B:A

if you need it actually sorted to always be 'A:B'(you should have specified that as part of your requirements) you will need to use sort instead of set as shown below

result = df['Value'].\           
          groupby(df[['Col1','Col2']].\
          apply(sorted,axis=1).apply(':'.join)).sum()

print(result['A:B']) # always sorted...
print(result)
  • Related