how do you extract first word after specific word in pandas-CodePudding

I'm trying to get social media usernames from a post. This is my code so far.

social_media=['kik']
df['socials'] = df['Request'].str.extract('({})'.format('|'.join(social_media)), 
                    flags=re.IGNORECASE, expand=False).str.lower().fillna('')

print(df['socials'])

This code prints kik for the rows that have a kik listed

How can I get it to output the username that's listed after kik

This is a sample of my dataframe:

Request
-------
0 my kik abcd
1 ig bby
2 check out my kik:1234
3 Kik hehehaha

My code outputs this:

socials
-------
0 kik
1
2 kik
3 kik

I'd like it to output this:

socials
-------
0 abcd
1
2 1234
3 hehehaha

CodePudding user response：

Create sample data

import pandas as pd
import re

df = pd.DataFrame()
df['request'] = ['my kik abcd', 'ig bby', 'check out my kik:1234', 'Kik hehehaha']

Filter data looking for only those that contain 'kik', case insensitive

filtered_df = df[df['request'].str.contains('kik', flags=re.IGNORECASE)]

Look through rows that have 'kik' and get everything after it:

df.loc[filtered_df.index, 'test'] = [re.split('kik', row, flags=re.IGNORECASE, maxsplit=1)[-1] for row in filtered_df['request']]

A few things to note

The 2nd row value still contains the semicolon, you can simply do a str.replace to get rid of it
I limit the split to a maximum of once (maxsplit=1), therefore it will always take everything after kik [-1].

Meaning a string of 'kik junk junk junk' will result in 'junk junk junk'