Home > Enterprise >  Replacing some odd year values with NaN
Replacing some odd year values with NaN

Time:07-26

I have this dataset

id name year
54132423 (2021 FT1) 2021
3733265 (A911 VD2) 911
2417217 417217 (6344 YS) 6344
54111244 (2021 CG3) 2021
3798973 (4788 BN6) 4788

I want to replace some odd year values which I got after extracting them from the name column with NaN.

I've tried using

df.year.replace(['911', '4788', '6344'], np.nan)

but it's not working. Please help.

CodePudding user response:

I realised that the year column has whitespaces to the right that's why my code wasn't working.

CodePudding user response:

Some more information about your project might be helpful, since I imagine it would be easier to filter out invalid values when you parse them from the name column, rather than pulling everything over to the year column and then filtering the year column.

Regardless, here is a solution that should work. If the column is a string (which the comments suggest it is):

df.apply(lambda x: "" if (int(x["year"]) < 2022 && int(x["year"])) > 0 else x["year"] ,1)

This solution uses the pandas apply function, python lambda functions, and python ternary operators to condense the code. More adjustment might be necessary (to filter the years better), and it may be more readable to pull out the function into a real def block.

  • Related