Here is my code. I would like to sum the FPKM
rows containing all specific target
and print all the corresponding targets and sum values in a new pd.
# coding=utf-8
import pandas as pd
import numpy as np
classes = [('Carbon;Pyruvate;vitamins', 16.7, 1),
('Pyruvate;Carbohydrate;Pentose and glucuronate', 30, 7),
('Lipid;Carbon;Galactose', 40.5, 9),
('Galactose;Pyruvate;Fatty acid', 57, 10),
('Fatty acid;Lipid', 22, 4)]
labels = ['Ko_class','FPKM', 'count']
alls = pd.DataFrame.from_records(classes, columns=labels)
target = [['Carbon'],['Pyruvate'],['Galactose']]
targetsum = pd.DataFrame.from_records(target,columns=['target'])
#######
targets = '|'.join(sum(target, []))
targetsum['total_FPKM']=(alls['FPKM']
.groupby(alls['Ko_class']
.str.contains(targets))
.sum())
targetsum['count']=(alls['count']
.groupby(alls['Ko_class']
.str.contains(targets))
.sum())
targetsum
Its results:
target total_FPKM count
0 Carbon NaN NaN
1 Pyruvate NaN NaN
2 Galactose NaN NaN
What I want is :
target total_FPKM count
0 Carbon 57.2 10
1 Pyruvate 103.7 18
2 Galactose 97.5 19
Hope I have described my question clearly:(
CodePudding user response:
try this :
def aggregation(dataframe,target):
targetsum = pd.DataFrame(columns=['target','sum','count'])
for val in target:
df_tempo=dataframe.loc[dataframe['Ko_class'].str.contains(val),:].copy()
new_row = {'target':val, 'sum':df_tempo['FPKM'].sum(), 'count':df_tempo['count'].sum()}
targetsum = targetsum.append(new_row, ignore_index=True)
return targetsum
df_result=aggregation(alls,['Carbon','Pyruvate','Galactose'])
Result :
target sum count
0 Carbon 57.2 10
1 Pyruvate 103.7 18
2 Galactose 97.5 19
CodePudding user response:
You can use str.findall
to find the substances that appear in your 'Ko_class' column, and assign
that back to a new column. Exploding this new list-valued column into a separate rows using explode
will allow you to groupby
on them and perform your aggregation:
target_list = ['Carbon','Pyruvate','Galactose']
target_substances = '|'.join(target_list)
alls.assign(
Ko_class_contains_target = alls['Ko_class'].str.findall(target_substances)
).explode('Ko_class_contains_target').groupby('Ko_class_contains_target').agg('sum')
prints back:
FPKM count
Ko_class_contains_target
Carbon 57.2 10
Galactose 97.5 19
Pyruvate 103.7 18