I have a dataframe with 2131 rows and 1 column which contains texts. I want to create a list with the unique words that start with 'USER_XXX'. Some of the examples are
'USER_lCMYsnyy', 'USER_lflwYYJv', 'USER_leyXqhCt', 'USER_loILacMG', 'USER_lOqOGDBi', 'USER_sFFJmYso'
Note that the length of each 'USER_XXX' can be different.
For example in the following example
'USER_sFFJmYso wants to play football with USER_loILacMG, however he askedUSER_leyXqhCt.'
should be
[USER_sFFJmYso, USER_loILacMG, USER_leyXqhCt]
CodePudding user response:
You can try doing something like this:
import re
string = 'USER_sFFJmYso wants to play football with USER_loILacMG, however he askedUSER_leyXqhCt.'
pattern = 'USER_[A-Za-z] '
print(re.findall(pattern, string))
Output:
['USER_sFFJmYso', 'USER_loILacMG', 'USER_leyXqhCt']
EDIT. If you need to return unique elements of USER_XXX
:
Just change the print like this:
print(list(set(re.findall(pattern, string))))
CodePudding user response:
I want to thanks first of all @lemon who helped me a lot with the pattern that I was looking. The problem with his/her code was that it doesn't return the unique users.
# Find the total number of Users
names_list = []
for i in df_final[0]:
names_list = re.findall('USER_[A-Za-z] ', i)
# Find the unique number of users
unique_names = set(names_list)