I have a code that give me from YouTube search such data as: Title, Channel Title and Description. This info storing into Pandas and now Im struggling while trying to add 1 column that will show validated emails from the Description column
(Actually, I'm trying to copy the Description column and filter it with generated REGEX)
Part of script that parsing data for me
ids = [item['id']['videoId'] for item in res['items']]
results = youtube.videos().list(id=ids, part='snippet').execute()
for result in results.get('items', []):
print(result ['id'])
print(result ['snippet']['channelTitle'])
print(result ['snippet']['title'])
print(result ['snippet']['description'])
Regex validation for Description
input = (result ['snippet']['description'])
def useRegex(input):
pattern = re.compile(r"([a-zA-Z] ( [a-zA-Z] ) ):.*[a-z0-9!#$%&'* /=?^_`{|}~-] (?:\\.[a-z0-9!#$%&'* /=?^_`{|}~-] )*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.) [a-z0-9](?:[a-z0-9-]*[a-z0-9])?", re.IGNORECASE)
return pattern.match(input)
Part of Pandas code
data = {'Channel Title': [result['snippet']['channelTitle'] for result in results['items']],
'Title': [result['snippet']['title'] for result in results['items']],
'Description': [result['snippet']['description'] for result in results['items']]
}
df = pd.DataFrame(data,
columns = ['Channel Title', 'Title', 'Description'],
)
df.head()
CodePudding user response:
Just apply your regex search to Description column
df["validated_email"] = df["Description"].apply(lambda _: useRegex(_))
Later you can filter validated emails by this column
df = df[df["validated_email"] != None]