I have a DataFrame like this in python
name id Background Complexion Ear Ear Accessories
Li1 3800 White Clean Ape Ear None
Lio2 5700 Purple Galaxy Clean Ape Ear Silver Earring
Lio4 8400 Green Clean Orc Ear Spiky Earring
Noam 3000 Noir Galaxy Clean Ape Ear Silver Earring
AVERSE 6100 Grey Galaxy Clean Ape Ear Gender Earring
I want to count every cell except two first colmn 'name' and 'id' and create a table like this finally
name id Background Complexion Ear Ear Accessories
Li1 3800 1 5 4 1
Lio2 5700 1 5 4 2
Lio4 8400 1 5 1 1
Noam 3000 1 5 4 2
AVERSE 6100 1 5 4 1
and calculate a formula on this numbers (for example 3) then we have
name id Background Complexion Ear Ear Accessories
Li1 3800 4 8 7 4
Lio2 5700 4 8 7 5
Lio4 8400 4 8 4 4
Noam 3000 4 8 7 5
AVERSE 6100 4 8 7 4
and sum numbers for each row
name id Background Complexion Ear Ear Accessories sum
Li1 3800 4 8 7 4 23
Lio2 5700 4 8 7 5 24
Lio4 8400 4 8 4 4 20
Noam 3000 4 8 7 5 24
AVERSE 6100 4 8 7 4 23
how can do this with pandas in python thaks
CodePudding user response:
IIUC, you need to loop to perform a count per column. You can use groupy.transform('count')
. The rest are simple vectorial operations (add
/sum
):
cols = ['name', 'id']
df2 = (df[cols]
.join(pd.DataFrame({c: df.groupby(c)[c].transform('count')
for c in df.drop(columns=cols).columns})
.add(3)
.assign(sum=lambda d: d.sum(1))
)
)
output:
name id Background Complexion Ear Ear Accessories sum
0 Li1 3800 4 8 7 4 23
1 Lio2 5700 4 8 7 5 24
2 Lio4 8400 4 8 4 4 20
3 Noam 3000 4 8 7 5 24
4 AVERSE 6100 4 8 7 4 23