Home > Net >  How do I check for null or string values ​in columns in the whole dataset in python?
How do I check for null or string values ​in columns in the whole dataset in python?

Time:04-26

def check_isnull(self):
df = pd.read_csv(self.table_name)

for j in df.values:
    for k in j[0:]:
        try:
            k = float(k)
            Flag=1
        except ValueError:
            Flag = 0
            break



if Flag==1:
    QMessageBox.information(self, "Information",
                            "Dataset is ready to train.",
                            QMessageBox.Close | QMessageBox.Help)
elif Flag==0:
    QMessageBox.information(self, "Information",
                            "There are one or more non-integer values.",
                            QMessageBox.Close | QMessageBox.Help)

Greetings, here is only 40 rows of the dataset I am trying to train it. I want to replace the null or string values ​​that exist here. My existing function for the replacement operation works without any problems. I wanted to write a function that only gives an output to detect them. My function sometimes gives an error, where is the problem?

             User ID        Age EstimatedSalary  Female  Male  Purchased
0           15624510  19.000000           19000       0     1          0
1    1581qqsdasd0944  35.000000          qweqwe       0     1          0
2           15668575  37.684211           43000       1     0          0
3                NaN  27.000000           57000       1     0          0
4           15804002  19.000000     69726.81704       0     1          0
..               ...        ...             ...     ...   ...        ...
395         15691863  46.000000           41000       1     0          1
396         15706071  51.000000           23000       0     1          1
397         15654296  50.000000           20000       1     0          1
398         15755018  36.000000           33000       0     1          0
399         15594041  49.000000           36000       1     0          1

CodePudding user response:

try

pd.to_numeric(df['estimated'],errors='coerce')

then use this to get rid of those rows and also rows with NANs

 df.dropna(subset='estimated')

CodePudding user response:

This should do the job:

mask = df.apply(lambda col: col.astype(str).str.replace('.','',1).str.isdigit(), axis=0)

This will first remove the dots in float numbers. Then check if all chars of a value are numeric. It will return a dataframe where any non numeric value is "False" and any numeric value is (True).

If you need to delete rows that have any of these values you can use:

df = df[mask.all(axis=1)]
  • Related