Good morning,
I have exhaustively searched for how best to do two things in Python/Pandas, and have not yet found the answer.
I have a df such as:
User | Role |
---|---|
Roger Dodger (rogerdodger) | user |
Edwin Cullen (edwincullen) | user |
Hunter Andrews (hunterandrews) | user |
I would like iterate over the user column and leave only the text inside the parenthesis, with a result such as:
User | Role |
---|---|
rogerdodger | user |
edwincullen | user |
hunterandrews | user |
I've found many successful ways for iterating. I've not found a way to do the string edits cleanly. I've seen some regex suggestions but am not all that familiar with how to implement them based on the other examples given.
CodePudding user response:
There are various ways to do that.
One way would be using pandas.Series.apply
and a custom lambda function as follows
df['User'] = df['User'].apply(lambda x: x[x.find('(') 1:x.find(')')])
[Out]:
User Role
0 rogerdodger user
1 edwincullen user
2 hunterandrews user
Another way could be with pandas.Series.str.extract
as follows
df['User'] = df['User'].str.extract(r'\((.*?)\)', expand=False)
[Out]:
User Role
0 rogerdodger user
1 edwincullen user
2 hunterandrews user
Notes:
If needed, one can also store the username in a different column, such as the column
username
as followsdf['username'] = df['User'].str.extract(r'\((.*?)\)', expand=False) [Out]: User Role username 0 Roger Dodger (rogerdodger) user rogerdodger 1 Edwin Cullen (edwincullen) user edwincullen 2 Hunter Andrews (hunterandrews) user hunterandrews