I am trying to pull a string name that is within a parentheses that contains the strings follow by comma and an integer.
My current dataframe output is this:
print df1:
name matches best match best 2 best 3
0 aparna [(aparn, 91), (Pankaj, 67), (arup, 45)] (aparn, 91) (Pankaj, 67) (arup, 45)
1 pankaj [(Pankaj, 100), (aparn, 55), (abc, 30)] (Pankaj, 100) (aparn, 55) (abc, 30)
2 sudhir [(sudhir c, 95), (arup, 22), (aparn, 18)] (sudhir c, 95) (arup, 22) (aparn, 18)
3 Geeku [(Geek, 89), (arup, 22), (Pankaj, 18)] (Geek, 89) (arup, 22) (Pankaj, 18)
But I want the data frame output to look like this:
print df1:
name matches best match best 2 best 3
0 aparna [(aparn, 91), (Pankaj, 67), (arup, 45)] aparn Pankaj arup
1 pankaj [(Pankaj, 100), (aparn, 55), (abc, 30)] Pankaj aparn abc
2 sudhir [(sudhir c, 95), (arup, 22), (aparn, 18)] sudhir c arup aparn
3 Geeku [(Geek, 89), (arup, 22), (Pankaj, 18)] Geek arup Pankaj
I currently have my df column as:
dframe1['best match'] = dframe1['matches'].str[0] #first best match (new column)
dframe1['best 2'] = dframe1['matches'].str[1] #2nd best match
dframe1['best 3'] = dframe1['matches'].str[2] #3nd best match
I have tried using str.extract but I am confused on how to only focus on grabbing the alphabet pattern.
CodePudding user response:
my first guess is, your problem is not about string handling but about accessing the items inside the array and the tuple. Does this work for you?
dframe1['best match'] = dframe1['matches'][0][0] #first best match (new column)
A few comments as explanation:
dframe1['matches'][0]
addresses the first item in your array:("aparn", 91)
. This is a Python tupledframe1['matches'][0][0]
addresses the first item in this tuple:"aparn"
More details about handling tuples in Python: https://www.w3schools.com/python/python_tuples.asp
CodePudding user response:
The solution to this problem is to use the function .apply I added .apply(lambda x: x[0]) to the end of dframe1['best match'] = dframe1['matches'].str[0] and the wanted output was successfully created.
dframe1['best match'] = dframe1['matches'].str[0].apply(lambda x: x[0])
also ty @KonstantinA.Magg and @Samwise for the help. It is a tuple, that def help with searching for the right function to use.