When doing an execute all in Jupiter notebook with the following code I receive a ValueError: cannot convert float NaN to integer
error. But when I run this specific cell a second time it works fine. Is there anything specific that could be causing the error while doing a Run All but will work when just running the specific cell.
# New birthdate calculations
def calculate_age(born):
born = datetime.strptime(born, "%m/%d/%Y").date()
today = date.today()
return today.year - born.year - ((today.month, today.day) < (born.month, born.day))
def generate_birthdate(age):
today = date.today()
if int(age) < 18:
new_birthdate = str(random.randrange(1,12)) '/' str(random.randrange(1,28)) '/' str(random.randrange(today.year-18,today.year))
else:
new_birthdate = str(random.randrange(1,12)) '/' str(random.randrange(1,28)) '/' str(random.randrange(today.year-90,today.year-18))
return new_birthdate
filt = (df_good_ssn['BIRTHDATE'] == '--/--/----')
df_good_ssn.loc[filt,'BIRTHDATE'] = '01/01/2000' # '--/--/----' is invalid. Asign any valid date for type casting. Will be overwriten by generate_birthdate
df_good_ssn.loc[~filt,'AGE'] = df_good_ssn['BIRTHDATE'].apply(calculate_age)
df_good_ssn.loc[~filt,'NEW_BIRTHDATE'] = df_good_ssn['AGE'].apply(generate_birthdate)
ValueError Traceback (most recent call last)
<ipython-input-5-9e2113163ab5> in <module>
18
19 df_good_ssn.loc[~filt,'AGE'] = df_good_ssn['BIRTHDATE'].apply(calculate_age)
---> 20 df_good_ssn.loc[~filt,'NEW_BIRTHDATE'] = df_good_ssn['AGE'].apply(generate_birthdate)
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
3846 else:
3847 values = self.astype(object).values
-> 3848 mapped = lib.map_infer(values, f, convert=convert_dtype)
3849
3850 if len(mapped) and isinstance(mapped[0], Series):
pandas\_libs\lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-5-9e2113163ab5> in generate_birthdate(age)
7 def generate_birthdate(age):
8 today = date.today()
----> 9 if int(age) < 18:
10 new_birthdate = str(random.randrange(1,12)) '/' str(random.randrange(1,28)) '/' str(random.randrange(today.year-18,today.year))
11 else:
ValueError: cannot convert float NaN to integer
CodePudding user response:
Would tried something like
df.fillna(0, inplace=True)
or similar applied to the column in question. That would have ensured the column data has type int
to work from that point.
As for your comment, if it is the case with filt
predicate, I would dig into something like pattern matching with regular expressions.
The latter may be cumbersome, however, this shall serve the purpose well.