I have a dataset in a dataframe and I want to see the total number of characters and the list of unique characters.
As for the total number of characters I have implemented the following code which seems is working well
df["Preprocessed_Text"].str.len().sum()
Could you please let me know how to get a list with the unique characters (not including the space)?
CodePudding user response:
Try this:
from string import ascii_letters
chars = set(''.join(df["Preprocessed_Text"])).intersection(ascii_letters)
If you need to work with a different alphabet, then simply replace ascii_letters
with whatever you need.
If you want every character but the space then:
chars = set(''.join(df["Preprocessed_Text"]).replace(' ', ''))
CodePudding user response:
unichars = list(''.join(df["Preprocessed_Text"]))
print(sorted(set(unichars), key=unichars.index))
CodePudding user response:
unique = list(set([letter for letter in ''.join(df['Processed_text'].values) if letter != " "]))