I have this data frame
import pandas as pd
data = [
['ACOT', '00001', '', '', 1.5, 20, 30, 'col1ACOT'],
['ACOT', '00002', '', '', 1.7, 20, 33,'col1ACOT'],
['ACOT', '00003', '','NA_0001' ,1.4, 20, 40,'col1ACOT'],
['PAN', '000090', 'canonical', '', 0.5, 10, 30,'col1PAN'],
['PAN', '000091', '', '', 0.4, 10, 30,'col1PAN'],
['TOM', '000080', 'canonical', '', 0.4, 10, 15,'col1TOM'],
['TOM', '000040', '', '', 1.7, 10, 300,'col1TOM']
]
df = pd.DataFrame(data, columns=[
'Gene_name', 'Transcript_ID', 'canonical', 'mane', 'metrics','start','end', 'Example_extra_col'])
Gene_name Transcript_ID canonical mane metrics start end Example_extra_col
0 ACOT 00001 1.5 20 30 col1ACOT
1 ACOT 00002 NA_0001 1.7 20 33 col1ACOT
2 ACOT 00003 1.4 20 40 col1ACOT
3 PAN 000090 canonical NA_00090 0.5 10 30 col1PAN
4 PAN 000091 0.4 10 30 col1PAN
5 TOM 000080 canonical 0.4 10 15 col1TOM
6 TOM 000040 1.7 10 300 col1TOM
I want this output
Gene_name canonical mane metrics Example_extra_col
0 ACOT No Yes 1.4-1.5 col1ACOT
4 PAN Yes Yes 0.5-0.4 col1PAN
5 TOM Yes No 1.7-0.4 col1TOM
Partially, I can do this with these lines
f = lambda x: "Yes" if x.any() else "No" # For canonical and mane
df = df.groupby('Gene_name').agg({'canonical': f, 'mane': f, 'metrics': ['min', 'max']})
canonical mane metrics_min metrics_max
Gene_name
ACOT No Yes 1.4 1.7
PAN Yes Yes 0.4 0.5
TOM Yes No 0.4 1.7
But I lost the Example_extra_col (and the following ones) because my real data frame has several more columns.
How can I achieve this?
CodePudding user response:
You can do everything with GroupBy.agg
:
out = (df
.groupby('Gene_name', as_index=False)
.agg({'canonical': 'any',
'mane': 'any',
'metrics': lambda x: f'{x.min()}-{x.max()}',
'Example_extra_col': 'first',
})
.replace({True: 'Yes', False: 'No'})
)
output:
Gene_name canonical mane metrics Example_extra_col
0 ACOT No Yes 1.4-1.7 col1ACOT
1 PAN Yes No 0.4-0.5 col1PAN
2 TOM Yes No 0.4-1.7 col1TOM
Alternative with names aggregation if you need to customize the output column names:
(df
.groupby('Gene_name')
.agg(**{'canonical': ('canonical', 'any'),
'mane': ('mane', 'any'),
'metrics': ('metrics', lambda x: f'{x.min()}-{x.max()}'),
'Example_extra_col': ('Example_extra_col', 'first')
})
)