I have a datafreme like this:
import pandas as pd
test = {'text': [
('tom-mark', 'tom', 'tom is a good guy.'),
('Nick X','nick', 'Is that Nick?')
]}, {'text': [
('juli', 'juli', 'Tom likes juli so much.'),
('tony', 'tony', 'Steve and Tony listen in as well.')
]}
I want to find the first word in the first element of each tuple (i.e. tom, Nick, juli, tony).
I tried the following code but it can't deal with '-' in tom-mark'
name = t[0].lower()
name = name.split()
name = name[0]
However, some tuples have 2 words as the first element. How could I find the first word of each tuple?
CodePudding user response:
Does something like this help:
import re
test = {'text': [
('tom-mark', 'tom', 'tom is a good guy.'),
('Nick X','nick', 'Is that Nick?'),
('juli', 'juli', 'Tom likes juli so much.'),
('tony', 'tony', 'Steve and Tony listen in as well.')]
}
first_names = []
for names in test['text']:
name = re.match(r'\w ', names[0])
first_names.append(name[0].lower())
print(first_names)
['tom', 'nick', 'juli', 'tony']
CodePudding user response:
You can use pandas dataframe and use a function to map the values of the text
column to get the first name and then create a list out of list of lists for that specific column.
Inside the function, use regular expression to extract only the first name from all tuples in that list and return a list of first names.
import pandas as pd
import re
def get_first(x):
return list(map(lambda tup: re.match(r'\w ', tup[0])[0].lower(), x))
test = {'text': [
('tom-mark', 'tom', 'tom is a good guy.'),
('Nick X','nick', 'Is that Nick?')
]}, {'text': [
('juli', 'juli', 'Tom likes juli so much.'),
('tony', 'tony', 'Steve and Tony listen in as well.')
]}
data = sum(pd.DataFrame(test).applymap(get_first)['text'].tolist(), [])
print(data)