How do I remove non-numeric values from specific column in pandas?-CodePudding

['0' '58699' '443' '55420' '53' '1900' '80' '0xb058' '0xacd9' '0xc0a8'
 '0x1432' '0x0000' '123' '67' '5353' '2104' '547' '1' '53290' '4805'
 '2151' '58767' '27643' '58652' '64416' '62529' '55952' '57286' '64466'
 '50497' '0xa29f' '0x2d8e' '0x5b79' '0xb0eb' '0x87b5' '0x8efa' '0xd83a'
 '52142' '52138' '52920' '60162' '54214' '50848' '56986' '50367' '49460'
 '55963' '53327' '52022' '57400' '51755' '52834' '54183' '62724' '54871'
 '59845' '56309' '61878' '58326' '56686']

The column's unique values look like this. When I run:

df[df.DstPort.apply(lambda x: x.isnumeric())].set_index('DstPort')

It takes too long to process because it has 250k rows and I was not able to see the result too. My concern is that they are not numerical all. Like '443', '80' instead of 443, 80 and there are 0xb0eb. How can I get rid of 0xb0eb them and convert this column to int datatype?

CodePudding user response：

Those are actually integers, just represented in a different base (base 16, also known as hexadecimal). Do you want them? If so, use

df.DstPort.apply(lambda x: int(x, 10 if x.isnumeric() else 16))

If you don't want them, filter by str.isnumeric() and then use .astype():

df[df.DstPort.str.isnumeric()].astype(int)