Home > Back-end >  How to filter with REGEX parsed YouTube data and put it inside Pandas Data Frame?
How to filter with REGEX parsed YouTube data and put it inside Pandas Data Frame?

Time:08-29

I have a code that give me from YouTube search such data as: Title, Channel Title and Description. This info storing into Pandas and now Im struggling while trying to add 1 column that will show validated emails from the Description column

(Actually, I'm trying to copy the Description column and filter it with generated REGEX)

Part of script that parsing data for me

ids = [item['id']['videoId'] for item in res['items']]
results = youtube.videos().list(id=ids, part='snippet').execute()
for result in results.get('items', []):
    print(result ['id'])
    print(result ['snippet']['channelTitle'])
    print(result ['snippet']['title'])
    print(result ['snippet']['description'])

Regex validation for Description

input = (result ['snippet']['description'])

def useRegex(input):
    pattern = re.compile(r"([a-zA-Z] ( [a-zA-Z] ) ):.*[a-z0-9!#$%&'* /=?^_`{|}~-] (?:\\.[a-z0-9!#$%&'* /=?^_`{|}~-] )*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.) [a-z0-9](?:[a-z0-9-]*[a-z0-9])?", re.IGNORECASE)
    return pattern.match(input)

Part of Pandas code

data = {'Channel Title': [result['snippet']['channelTitle'] for result in results['items']],
        'Title': [result['snippet']['title'] for result in results['items']],
        'Description': [result['snippet']['description'] for result in results['items']]
       }

df = pd.DataFrame(data,
                  columns = ['Channel Title', 'Title', 'Description'],
                 )
    
df.head()

CodePudding user response:

Just apply your regex search to Description column

df["validated_email"] = df["Description"].apply(lambda _: useRegex(_))

Later you can filter validated emails by this column

df = df[df["validated_email"] != None]
  • Related