Home > OS >  How to delete icons from comments in csv files using pandas
How to delete icons from comments in csv files using pandas

Time:01-17

I am try to delete an icons which appears in many rows of my csv file. When I create a dataframe object using pd.read_csv it shows a green squared check icon, but if I open the csv using Excel I see ✅ instead. I tried to delete using split function because the verification status is separated by | to the comment:

df['reviews'] = df['reviews'].apply(lambda x: x.split('|')[1])

I noticed it didn't detect the "|" separator when the review contains the icon mentioned above.

enter image description here

I am not sure if it is an encoding problem. I tried to add encoding='utf-8' in pandas read_csv but It didn't solve the problem.

Thanks in advance.

I would like to add, this is a pic when I open the csv file using Excel.

enter image description here

CodePudding user response:

You can remove non-latin characters using encode/decode methods:

>>> df
           reviews
0  ✓ Trip Verified
1         Verified

>>> df['reviews'].str.encode('latin1', errors='ignore').str.decode('latin1')
0     Trip Verified
1          Verified
Name: reviews, dtype: object

CodePudding user response:

Say you had the following dataframe:

           reviews
0  ✅ Trip Verified
1     Not Verified
2     Not Verified
3  ✅ Trip Verified

You can use the replace method to replace the ✅ symbol which is unicode character 2705.

df['reviews'] = df['reviews'].apply(lambda x: x.replace('\u2705',''))




Here is the full example:

Code:

import pandas as pd

df = pd.DataFrame({"reviews":['\u2705 Trip Verified', 'Not Verified', 'Not Verified', '\u2705 Trip Verified']})
df['reviews'] = df['reviews'].apply(lambda x: x.replace('\u2705',''))
print(df)

Output:

          reviews
0   Trip Verified
1    Not Verified
2    Not Verified
3   Trip Verified
  • Related