I want to find the number of rows of clin
dataframe where the OS_MONTHS
value is <= 12.0
. The values in the OS_MONTHS
are float.
This seems like a trivial question.
import pandas as pd
len(clin["OS_MONTHS"] <= 12.0)
Traceback:
TypeError: '<=' not supported between instances of 'str' and 'float'
Data type:
type(clin["OS_MONTHS"])
pandas.core.series.Series
Dataframe
SEX | KPS | A header | AGE | OS_MONTHS | |
---|---|---|---|---|---|
0 | 1 | 80 | 44 | 1 | 11.76 |
1 | 0 | 100 | 50 | 1 | 4.73 |
2 | 1 | 80 | 40 | 1 | 23.16 |
3 | 1 | 80 | 61 | 1 | 10.58 |
4 | 1 | 80 | 20 | 1 | 35.38 |
CodePudding user response:
clin["OS_MONTHS"].astype(float) <= 12.0
if you want to get length:
(clin["OS_MONTHS"].astype(float) <= 12.0).value_counts()
or
s = clin["OS_MONTHS"]
len(s[s.astype(float) <= 1.5])
get your data unique values: unique()
, there are some values that are not in float
format, and you must handle theme in a manner... for example:
clin["OS_MONTHS"][clin["OS_MONTHS"] != '[Not Available]']
CodePudding user response:
Check this out:
clin["OS_MONTHS"][~clin["OS_MONTHS"].str.replace('.','').str.isdigit()] = float('NaN')
# Then you can apply @MoRe's solution
clin["OS_MONTHS"].astype(float) <= 12.0