Home > Back-end >  New column with word at nth position of string from other column pandas
New column with word at nth position of string from other column pandas

Time:06-22

import numpy as np
import pandas as pd

d = {'ABSTRACT_ID': [14145090,1900667, 8157202,6784974], 
     'TEXT': [
         "velvet antlers vas are commonly used in tradit",
         "we have taken a basic biologic RPA to elucidat4",
         "ceftobiprole bpr is an investigational cephalo",
         "lipoperoxidationderived aldehydes for example",],
     'LOCATION': [1, 4, 2, 1]}

df = pd.DataFrame(data=d)
df

def word_at_pos(x,y):
    pos=x
    string= y

    count = 0
    res = ""
    for word in string:
        if word == ' ':
           count = count   1
        if count == pos:
            break
            res = ""
        else :
            res = res   word
    print(res) 

word_at_pos(df.iloc[0,2],df.iloc[0,1])

For this df I want to create a new column WORD that contains the word from TEXT at the position indicated by LOCATION. e.g. first line would be "velvet".

I can do this for a single line as an isolated function world_at_pos(x,y), but can't work out how to apply this to whole column. I have done new columns with Lambda functions before, but can't work out how to fit this function to lambda.

CodePudding user response:

Looping over TEXT and LOCATION could be the best idea because splitting creates a jagged array, so filtering using numpy advanced indexing won't be possible.

df["WORDS"] = [txt.split()[loc] for txt, loc in zip(df["TEXT"], df["LOCATION"]-1)]
print(df)

   ABSTRACT_ID  ...                    WORDS
0     14145090  ...                   velvet
1      1900667  ...                        a
2      8157202  ...                      bpr
3      6784974  ...  lipoperoxidationderived

[4 rows x 4 columns]
  • Related