Home > Back-end >  Regex: Creating a list with unique words that start with 'USER_XXX'
Regex: Creating a list with unique words that start with 'USER_XXX'

Time:05-11

I have a dataframe with 2131 rows and 1 column which contains texts. I want to create a list with the unique words that start with 'USER_XXX'. Some of the examples are

'USER_lCMYsnyy', 'USER_lflwYYJv', 'USER_leyXqhCt', 'USER_loILacMG', 'USER_lOqOGDBi', 'USER_sFFJmYso'

Note that the length of each 'USER_XXX' can be different.

For example in the following example

'USER_sFFJmYso wants to play football with USER_loILacMG, however he askedUSER_leyXqhCt.'

should be

[USER_sFFJmYso, USER_loILacMG, USER_leyXqhCt]

CodePudding user response:

You can try doing something like this:

import re

string = 'USER_sFFJmYso wants to play football with USER_loILacMG, however he askedUSER_leyXqhCt.'

pattern = 'USER_[A-Za-z] '

print(re.findall(pattern, string))

Output:

['USER_sFFJmYso', 'USER_loILacMG', 'USER_leyXqhCt']

EDIT. If you need to return unique elements of USER_XXX:

Just change the print like this:

print(list(set(re.findall(pattern, string))))

CodePudding user response:

I want to thanks first of all @lemon who helped me a lot with the pattern that I was looking. The problem with his/her code was that it doesn't return the unique users.

# Find the total number of Users
names_list = []
for i in df_final[0]:
  names_list  = re.findall('USER_[A-Za-z] ', i)

# Find the unique number of users 
unique_names = set(names_list)
  • Related