Home > Net >  Count data by previous column and create new one
Count data by previous column and create new one

Time:06-18

I have following data in csv :
species,origin,count
Bacillus acidicola,GenBank,1
Bacillus acidicola,RefSeq,1
Bacillus aerius,GenBank,1
Bacillus aerolatus,RefSeq,1

My expecting output is like:
species,genbank_count, refseq_count
Bacillus acidicola,1, 1
Bacillus aerius,1, 0
Bacillus aerolatus,0,1

I tried code to count origin like examples below :

gen_bank = pd.read_csv('res.csv')
print(df.loc[gen_bank['0'] == 'GenBank'])

count = df.groupby(['species', 'origin']).size()

df.count().to_frame('counts').reset_index()

count['GeneBank'] = df.groupby(['species'], ['id']).size()

count['RefSeq'] = df.loc[df.origin == 'RefSeq', 'origin'].count()

CodePudding user response:

IIUC, you can use pivot to get the result

df.pivot('species', columns='origin').reset_index().fillna(0)
    species     count
    origin              GenBank     RefSeq
0   Bacillus acidicola  1.0         1.0
1   Bacillus aerius     1.0         0.0
2   Bacillus aerolatus  0.0         1.0
  • Related