I have this dataset
id | name | year |
---|---|---|
54132423 | (2021 FT1) | 2021 |
3733265 | (A911 VD2) | 911 |
2417217 | 417217 (6344 YS) | 6344 |
54111244 | (2021 CG3) | 2021 |
3798973 | (4788 BN6) | 4788 |
I want to replace some odd year values which I got after extracting them from the name column with NaN.
I've tried using
df.year.replace(['911', '4788', '6344'], np.nan)
but it's not working. Please help.
CodePudding user response:
I realised that the year column has whitespaces to the right that's why my code wasn't working.
CodePudding user response:
Some more information about your project might be helpful, since I imagine it would be easier to filter out invalid values when you parse them from the name
column, rather than pulling everything over to the year column and then filtering the year column.
Regardless, here is a solution that should work. If the column is a string (which the comments suggest it is):
df.apply(lambda x: "" if (int(x["year"]) < 2022 && int(x["year"])) > 0 else x["year"] ,1)
This solution uses the pandas apply function, python lambda functions, and python ternary operators to condense the code. More adjustment might be necessary (to filter the years better), and it may be more readable to pull out the function into a real def
block.