Home > Software design >  How to extract a string within a parentheses that contains the string, a comma, and an integer?
How to extract a string within a parentheses that contains the string, a comma, and an integer?

Time:12-14

I am trying to pull a string name that is within a parentheses that contains the strings follow by comma and an integer.

My current dataframe output is this:

print df1:

      name                                    matches      best match        best 2        best 3
0  aparna    [(aparn, 91), (Pankaj, 67), (arup, 45)]     (aparn, 91)  (Pankaj, 67)    (arup, 45)
1  pankaj    [(Pankaj, 100), (aparn, 55), (abc, 30)]   (Pankaj, 100)   (aparn, 55)     (abc, 30)
2  sudhir  [(sudhir c, 95), (arup, 22), (aparn, 18)]  (sudhir c, 95)    (arup, 22)   (aparn, 18)
3   Geeku     [(Geek, 89), (arup, 22), (Pankaj, 18)]      (Geek, 89)    (arup, 22)  (Pankaj, 18)

But I want the data frame output to look like this:

print df1:

      name                                    matches   best match   best 2   best 3
0  aparna    [(aparn, 91), (Pankaj, 67), (arup, 45)]     aparn      Pankaj   arup
1  pankaj    [(Pankaj, 100), (aparn, 55), (abc, 30)]     Pankaj     aparn    abc
2  sudhir  [(sudhir c, 95), (arup, 22), (aparn, 18)]     sudhir c   arup     aparn
3   Geeku     [(Geek, 89), (arup, 22), (Pankaj, 18)]     Geek       arup     Pankaj

I currently have my df column as:

dframe1['best match'] = dframe1['matches'].str[0] #first best match (new column)
dframe1['best 2'] = dframe1['matches'].str[1] #2nd best match
dframe1['best 3'] = dframe1['matches'].str[2] #3nd best match

I have tried using str.extract but I am confused on how to only focus on grabbing the alphabet pattern.

CodePudding user response:

my first guess is, your problem is not about string handling but about accessing the items inside the array and the tuple. Does this work for you?

dframe1['best match'] = dframe1['matches'][0][0] #first best match (new column)

A few comments as explanation:

  • dframe1['matches'][0] addresses the first item in your array: ("aparn", 91). This is a Python tuple
  • dframe1['matches'][0][0] addresses the first item in this tuple: "aparn"

More details about handling tuples in Python: https://www.w3schools.com/python/python_tuples.asp

CodePudding user response:

The solution to this problem is to use the function .apply I added .apply(lambda x: x[0]) to the end of dframe1['best match'] = dframe1['matches'].str[0] and the wanted output was successfully created.

dframe1['best match'] = dframe1['matches'].str[0].apply(lambda x: x[0])

also ty @KonstantinA.Magg and @Samwise for the help. It is a tuple, that def help with searching for the right function to use.

  • Related