I currently using pandas to summarize my data. I have a data listed like this (the real data have ten of thousands of entries).
A | B | Intensity | Area |
---|---|---|---|
3 | 4 | 20.2 | 55 |
3 | 4 | 20.7 | 23 |
3 | 4 | 30.2 | 17 |
3 | 4 | 51.8 | 80 |
5 | 6 | 79.6 | 46 |
5 | 6 | 11.9 | 77 |
5 | 7 | 56.7 | 19 |
5 | 7 | 23.4 | 23 |
I would like to group the columns (A & B) together and list down the all the intensity and area values without aggregating the values (eg calculate mean, median, mode etc)
A,B | |||||
---|---|---|---|---|---|
Intensity | 3,4 | 20.2 | 20.7 | 30.2 | 51.8 |
5,6 | 79.6 | 11.9 | NaN | NaN | |
5,7 | 56.7 | 23.4 | NaN | NaN | |
Area | 3,4 | 55 | 23 | 17 | 80 |
5,6 | 46 | 77 | NaN | NaN | |
5,7 | 19 | 23 | NaN | NaN |
CodePudding user response:
Here is one way to do it
# Melt to make wide layout to long, bring area and intensity as rows
df2=df.melt(id_vars=['A', 'B'])
# concat A and B into a single column
df2['A,B']=df2['A'].astype(str) ',' df2['B'].astype(str)
# drop A and B
df2.drop(columns=['A','B'], inplace=True)
# create a sequence number to aid in creating column in result
df2['seq']=df2.assign(seq=1).groupby(['variable','A,B'])['seq'].cumsum()
# do a pivot, and format the resultset
df2=(df2.pivot(index=['variable','A,B'], columns='seq', values='value')
.reset_index()
.rename_axis(columns=None)
.rename(columns={'variable':''}))
df2
A,B 1 2 3 4
0 Area 3,4 55.0 23.0 17.0 80.0
1 Area 5,6 46.0 77.0 NaN NaN
2 Area 5,7 19.0 23.0 NaN NaN
3 Intensity 3,4 20.2 20.7 30.2 51.8
4 Intensity 5,6 79.6 11.9 NaN NaN
5 Intensity 5,7 56.7 23.4 NaN NaN
CodePudding user response:
you can use:
df['class']=df['A'].astype(str) ',' df['B'].astype(str)
def convert_values(col_name):
dfx=pd.DataFrame(df[[col_name,'class']].groupby('class').agg(list)[col_name].to_list(),index=df[[col_name,'class']].groupby('class').agg(list).index).reset_index()
dfx.index=[col_name] * len(dfx)
return dfx
df1=convert_values('Intensity')
df2=convert_values('Area')
final=pd.concat([df1,df2])
print(final)
'''
class 0 1 2 3
Intensity 3,4 20.2 20.7 30.2 51.8
Intensity 5,6 79.6 11.9 nan nan
Intensity 5,7 56.7 23.4 nan nan
Area 3,4 55 23 17.0 80.0
Area 5,6 46 77 nan nan
Area 5,7 19 23 nan nan
'''