def check_isnull(self):
df = pd.read_csv(self.table_name)
for j in df.values:
for k in j[0:]:
try:
k = float(k)
Flag=1
except ValueError:
Flag = 0
break
if Flag==1:
QMessageBox.information(self, "Information",
"Dataset is ready to train.",
QMessageBox.Close | QMessageBox.Help)
elif Flag==0:
QMessageBox.information(self, "Information",
"There are one or more non-integer values.",
QMessageBox.Close | QMessageBox.Help)
Greetings, here is only 40 rows of the dataset I am trying to train it. I want to replace the null or string values that exist here. My existing function for the replacement operation works without any problems. I wanted to write a function that only gives an output to detect them. My function sometimes gives an error, where is the problem?
User ID Age EstimatedSalary Female Male Purchased
0 15624510 19.000000 19000 0 1 0
1 1581qqsdasd0944 35.000000 qweqwe 0 1 0
2 15668575 37.684211 43000 1 0 0
3 NaN 27.000000 57000 1 0 0
4 15804002 19.000000 69726.81704 0 1 0
.. ... ... ... ... ... ...
395 15691863 46.000000 41000 1 0 1
396 15706071 51.000000 23000 0 1 1
397 15654296 50.000000 20000 1 0 1
398 15755018 36.000000 33000 0 1 0
399 15594041 49.000000 36000 1 0 1
CodePudding user response:
try
pd.to_numeric(df['estimated'],errors='coerce')
then use this to get rid of those rows and also rows with NANs
df.dropna(subset='estimated')
CodePudding user response:
This should do the job:
mask = df.apply(lambda col: col.astype(str).str.replace('.','',1).str.isdigit(), axis=0)
This will first remove the dots in float numbers. Then check if all chars of a value are numeric. It will return a dataframe where any non numeric value is "False" and any numeric value is (True).
If you need to delete rows that have any of these values you can use:
df = df[mask.all(axis=1)]