['0' '58699' '443' '55420' '53' '1900' '80' '0xb058' '0xacd9' '0xc0a8'
'0x1432' '0x0000' '123' '67' '5353' '2104' '547' '1' '53290' '4805'
'2151' '58767' '27643' '58652' '64416' '62529' '55952' '57286' '64466'
'50497' '0xa29f' '0x2d8e' '0x5b79' '0xb0eb' '0x87b5' '0x8efa' '0xd83a'
'52142' '52138' '52920' '60162' '54214' '50848' '56986' '50367' '49460'
'55963' '53327' '52022' '57400' '51755' '52834' '54183' '62724' '54871'
'59845' '56309' '61878' '58326' '56686']
The column's unique values look like this. When I run:
df[df.DstPort.apply(lambda x: x.isnumeric())].set_index('DstPort')
It takes too long to process because it has 250k rows and I was not able to see the result too. My concern is that they are not numerical all. Like '443', '80' instead of 443, 80 and there are 0xb0eb. How can I get rid of 0xb0eb them and convert this column to int datatype?
CodePudding user response:
Those are actually integers, just represented in a different base (base 16, also known as hexadecimal). Do you want them? If so, use
df.DstPort.apply(lambda x: int(x, 10 if x.isnumeric() else 16))
If you don't want them, filter by str.isnumeric()
and then use .astype()
:
df[df.DstPort.str.isnumeric()].astype(int)