So I have a dataframe like this, and I want to find how many students have more than two school experiences.
Name | Primary school | Middle school | High School |
---|---|---|---|
Alex | Tampli school | Compton School | The Tampli high school |
Peng | New york public | North Star School | |
Hu | L.A public | South planick school |
df["Name","Primary school","Middle school","High School"].isnull()
Name | Primary school | Middle school | High School |
---|---|---|---|
Alex | False | False | False |
Peng | False | False | True |
Hu | False | False | True |
We can use df.count_value() to get the column summary, but how could I get sum value of rows?
desired output
df["Enough Experience?"]= #code
Name | Primary school | Middle school | High School | Enough experience? |
---|---|---|---|---|
Alex | False | False | False | True |
Peng | False | False | True | False |
Hu | False | False | True | False |
CodePudding user response:
Based on your description of the problem, I think you need:
df['Enough experience?'] = df.drop(columns='Name').sum(axis=1).ge(2)
You shouldn't use isnull
, that checks for NaN/None, instead sum
the booleans (each True
counts for 1
).
output:
Name Primary school Middle school High School Enough experience?
0 Alex False False False False
1 Peng False False True False
2 Hu False False True False
To get the total:
experienced_students = df.drop(columns='Name').sum(axis=1).ge(2).sum()
output: 2
Count of schools:
df['number schools'] = df.drop(columns='Name').sum(axis=1)
Name Primary school Middle school High School number schools
0 Alex False False False 0
1 Peng False False True 1
2 Hu False False True 1
CodePudding user response:
Use:
print (df)
Name Primary school Middle school High School
0 Alex 1.0 NaN NaN
1 Peng 1.0 NaN 2.0
2 Hu 2.0 7.0 0.0
3 John NaN NaN NaN
# test if all non missing values
cols = ["Primary school","Middle school","High School"]
df["Enough Experience?1"] = ~df[cols].isnull().any(axis=1)
# test at least 1 missing values
df["Enough Experience?2"] = df[cols].isnull().any(axis=1)
# test at least 2 missing values per rows with `sum`
df["Enough Experience?3"] = df[cols].isnull().sum(axis=1).lt(2)
df["Enough Experience?4"] = df[cols].count(axis=1).ge(2)
print (df)
Name Primary school Middle school High School Enough Experience?1 \
0 Alex 1.0 NaN NaN False
1 Peng 1.0 NaN 2.0 False
2 Hu 2.0 7.0 0.0 True
3 John NaN NaN NaN False
Enough Experience?2 Enough Experience?3 Enough Experience?4
0 True False False
1 True True True
2 False True True
3 True False False
Solutions by table ouput - testing if all values are not missing:
# test at least 1 missing values
cols = ["Primary school","Middle school","High School"]
df["Enough Experience?1"] = df[cols].notna().all(axis=1)
df["Enough Experience?2"] = ~df[cols].isna().any(axis=1)
df["Enough Experience?3"] = df[cols].isnull().sum(axis=1)<1
print (df)
Name Primary school Middle school High School Enough Experience?1 \
0 Alex 1.0 NaN NaN False
1 Peng 1.0 NaN 2.0 False
2 Hu 2.0 7.0 0.0 True
3 John NaN NaN NaN False
Enough Experience?2 Enough Experience?3
0 False False
1 False False
2 True True
3 False False