I have following data in csv :
species,origin,count
Bacillus acidicola,GenBank,1
Bacillus acidicola,RefSeq,1
Bacillus aerius,GenBank,1
Bacillus aerolatus,RefSeq,1
My expecting output is like:
species,genbank_count, refseq_count
Bacillus acidicola,1, 1
Bacillus aerius,1, 0
Bacillus aerolatus,0,1
I tried code to count origin like examples below :
gen_bank = pd.read_csv('res.csv')
print(df.loc[gen_bank['0'] == 'GenBank'])
count = df.groupby(['species', 'origin']).size()
df.count().to_frame('counts').reset_index()
count['GeneBank'] = df.groupby(['species'], ['id']).size()
count['RefSeq'] = df.loc[df.origin == 'RefSeq', 'origin'].count()
CodePudding user response:
IIUC, you can use pivot to get the result
df.pivot('species', columns='origin').reset_index().fillna(0)
species count
origin GenBank RefSeq
0 Bacillus acidicola 1.0 1.0
1 Bacillus aerius 1.0 0.0
2 Bacillus aerolatus 0.0 1.0