Hi I hope you could help me with this.
I have a dataframe df_test
import pandas as pd
import numpy as np
df_test = pd.DataFrame(data=[[np.nan,2,3,"male"],
[4,5,6,"female"],
[1,2,np.nan,"female"]],
columns=['a','b','c','sex'])
df_test
a b c sex
0 NaN 2 3.0 male
1 4.0 5 6.0 female
2 1.0 2 NaN female
and for each category in the sex column I would like to count the number of values greater than 1 in each column of the dataframe (a,b,c)
.
df_results
a_count b_count c_count sex
0 NaN 1 1 male
1 2 2 1 female
CodePudding user response:
You can use groupby
and aggregate to count the values :
df_test = pd.DataFrame(data=[[np.nan,2,3,"male"],
[4,5,6,"female"],
[1,2,np.nan,"female"]],
columns=['a','b','c','sex'])
df_test.groupby('sex', sort=False).agg(lambda x : len(x.dropna()>1))
This gives us expected output :
a b c
sex
male 0 1 1
female 2 2 1
If at all you want those values the values to be Nan, then you can do
df_test.groupby('sex', sort=False).agg(lambda x : np.nan if len(x.dropna()) == 0 else len(x.dropna()))
a b c
sex
male NaN 1 1
female 2.0 2 1
Since the column contains NaN values, pandas does some internal optimization to convert int
to float
internally. So you may have to explicitly convert the column to int.
CodePudding user response:
Check Below code (I have assumed all values will be greater than or equal to 1 or nan based upon the example data)
pd.DataFrame(np.where(df_test.values == 1, np.nan, df_test.values), columns = df_test.columns).groupby(“sex”).count().reset_index()
CodePudding user response:
please correct
columns=[['a','b','c','sex']]
and replace with
columns=['a','b','c','sex']
then
pd.concat([df_test.sex, df_test.drop(columns=["sex"]) >= 1], axis=1).groupby("sex").sum().replace(0, np.nan).rename(columns=lambda x: x "_count").reset_index()
CodePudding user response:
Let ust try:
(df_test.drop('sex', axis=1).ge(1) # compare the data with `1`
.groupby(df_test['sex'],sort=False).sum() # count the number of `True` with sum
.add_suffix('_count') # add the suffix
.reset_index() # make `sex` a column
)
Output:
sex a_count b_count c_count
0 male 0 1 1
1 female 2 2 1