Home > OS >  Grouping columns with same values without aggregating columns with different values in pandas
Grouping columns with same values without aggregating columns with different values in pandas

Time:10-31

I currently using pandas to summarize my data. I have a data listed like this (the real data have ten of thousands of entries).

A B Intensity Area
3 4 20.2 55
3 4 20.7 23
3 4 30.2 17
3 4 51.8 80
5 6 79.6 46
5 6 11.9 77
5 7 56.7 19
5 7 23.4 23

I would like to group the columns (A & B) together and list down the all the intensity and area values without aggregating the values (eg calculate mean, median, mode etc)

A,B
Intensity 3,4 20.2 20.7 30.2 51.8
5,6 79.6 11.9 NaN NaN
5,7 56.7 23.4 NaN NaN
Area 3,4 55 23 17 80
5,6 46 77 NaN NaN
5,7 19 23 NaN NaN

CodePudding user response:

Here is one way to do it


# Melt to make wide layout to long, bring area and intensity as rows
df2=df.melt(id_vars=['A', 'B'])

# concat A and B into a single column
df2['A,B']=df2['A'].astype(str) ',' df2['B'].astype(str)

# drop A and B
df2.drop(columns=['A','B'], inplace=True)

# create a sequence number to aid in creating column in result
df2['seq']=df2.assign(seq=1).groupby(['variable','A,B'])['seq'].cumsum()

# do a pivot, and format the resultset
df2=(df2.pivot(index=['variable','A,B'], columns='seq', values='value')
     .reset_index()
     .rename_axis(columns=None)
     .rename(columns={'variable':''}))
df2
                A,B        1       2       3       4
0   Area        3,4     55.0    23.0    17.0    80.0
1   Area        5,6     46.0    77.0    NaN     NaN
2   Area        5,7     19.0    23.0    NaN     NaN
3   Intensity   3,4     20.2    20.7    30.2    51.8
4   Intensity   5,6     79.6    11.9    NaN     NaN
5   Intensity   5,7     56.7    23.4    NaN     NaN

CodePudding user response:

you can use:

df['class']=df['A'].astype(str)   ','   df['B'].astype(str)

def convert_values(col_name):
    dfx=pd.DataFrame(df[[col_name,'class']].groupby('class').agg(list)[col_name].to_list(),index=df[[col_name,'class']].groupby('class').agg(list).index).reset_index()
    dfx.index=[col_name] * len(dfx)
    return dfx
df1=convert_values('Intensity')
df2=convert_values('Area')
final=pd.concat([df1,df2])
print(final)
'''
            class   0       1       2       3
Intensity   3,4     20.2    20.7    30.2    51.8
Intensity   5,6     79.6    11.9    nan     nan 
Intensity   5,7     56.7    23.4    nan     nan 
Area        3,4     55      23      17.0    80.0
Area        5,6     46      77      nan     nan
Area        5,7     19      23      nan     nan

'''
  • Related