Home > Software engineering >  find the first word of the first element of a list of tuples in a column?
find the first word of the first element of a list of tuples in a column?

Time:09-28

I have a datafreme like this:

import pandas as pd

test = {'text': [
    ('tom-mark', 'tom', 'tom is a good guy.'),
    ('Nick X','nick', 'Is that Nick?')
]}, {'text': [
    ('juli', 'juli', 'Tom likes juli so much.'),
    ('tony', 'tony', 'Steve and Tony listen in as well.')
]}

I want to find the first word in the first element of each tuple (i.e. tom, Nick, juli, tony).

I tried the following code but it can't deal with '-' in tom-mark'

    name = t[0].lower()
    name = name.split()
    name = name[0]

However, some tuples have 2 words as the first element. How could I find the first word of each tuple?

CodePudding user response:

Does something like this help:

import re

test = {'text': [
    ('tom-mark', 'tom', 'tom is a good guy.'),
    ('Nick X','nick', 'Is that Nick?'),
    ('juli', 'juli', 'Tom likes juli so much.'),
    ('tony', 'tony', 'Steve and Tony listen in as well.')]
}

first_names = []

for names in test['text']:
    name = re.match(r'\w ', names[0])
    first_names.append(name[0].lower())


print(first_names)

['tom', 'nick', 'juli', 'tony']

CodePudding user response:

You can use pandas dataframe and use a function to map the values of the text column to get the first name and then create a list out of list of lists for that specific column.

Inside the function, use regular expression to extract only the first name from all tuples in that list and return a list of first names.

import pandas as pd
import re


def get_first(x):
    return list(map(lambda tup: re.match(r'\w ', tup[0])[0].lower(), x))

test = {'text': [
    ('tom-mark', 'tom', 'tom is a good guy.'),
    ('Nick X','nick', 'Is that Nick?')
]}, {'text': [
    ('juli', 'juli', 'Tom likes juli so much.'),
    ('tony', 'tony', 'Steve and Tony listen in as well.')
]}

data = sum(pd.DataFrame(test).applymap(get_first)['text'].tolist(), [])

print(data)
  • Related