Home > Net >  Remove quote marks in object column
Remove quote marks in object column

Time:09-08

I'm struggling to find a solution to this problem which is why I'm here.

I have a dataframe column num_list that contains letters and numbers:

df['num_list']
0          "8E"
1          "5E"
2         "19A"
3         "16E"
4         "26D"
  ...  
539032     "5E"
539033     "6E"
539034    "16E"
539035     "7E"
539036     "5E"
Name: carweb_abi2_50, Length: 539037, dtype: object

I want to remove all the letters and quotation marks. I've managed the letters part getting to here:

0          8
1          5
2         19
3         16
4         26
  ..
Name: carweb_abi2_50, Length: 539037, dtype: object

However, I can't convert to integer and when I check the unique elements for the column I see this:

array(['8', '5', '19', '16', '26', '24', '15', '14', '6', '28', '18',
       '20', '7', '41', '25', '31', '17', '9', '12', '4', '23', '10',
       '27', '40', '30', '3', '21', '13', '22', '11', '33', '42', '34',
       '32', '36', '1', '2', '39', '', '29', '37', 0, '38', '43', '35',
       '45', '44', '47', '46', '49', '48', '50', '0'], dtype=object)

Which shows the nan values I replaced with zero are actual number 0 but all the other values are quoted for some reason.

I've tried extracting only the integers into a new column but no luck.

TYIA

CodePudding user response:

You can use regex:

df["num_list"] = df["num_list"].str.replace(r'\D ', '', regex=True)

and then convert to Integer:

df["num_list"] = df["num_list"].astype(int)
  • Related