Home > other >  pandas: add a string to certain values in a comma separated column if values exist in a list
pandas: add a string to certain values in a comma separated column if values exist in a list

Time:09-06

I have a pandas dataframe as follows,

import pandas as pd
import numpy as np

df = pd.DataFrame({'text':['this is the good student','she wears a beautiful green dress','he is from a friendly family of four','the house is empty','the number four five is new'],
               'labels':['O,O,O,ADJ,O','O,O,O,ADJ,ADJ,O','O,O,O,O,ADJ,O,O,NUM','O,O,O,O','O,O,NUM,NUM,O,O']})

I would like to add a 'B-' label to the ADJ or NUM is they are not repeated right after, and 'I-' if there is a repetition. so here is my desired output,

output:

                                   text               labels
0              this is the good student          O,O,O,B-ADJ,O
1     she wears a beautiful green dress      O,O,O,B-ADJ,I-ADJ,O
2  he is from a friendly family of four  O,O,O,O,B-ADJ,O,O,B-NUM
3                    the house is empty              O,O,O,O
4           the number four five is new      O,O,B-NUM,I-NUM,O,O

so far I have created a list of unique values as such

unique_labels = (np.unique(sum(df["labels"].str.split(',').dropna().to_numpy(), []))).tolist()
unique_labels.remove('O') # no changes required for O label

and tried to first add the B label which I got an error(ValueError: Must have equal len keys and value when setting with an iterable),

for x in unique_labels:
    df.loc[df["labels"].str.contains(x), "labels"]= ['B-'   x for x in df["labels"]]

CodePudding user response:

Try:

from itertools import groupby


def fn(x):
    out = []
    for k, g in groupby(map(str.strip, x.split(","))):
        if k == "O":
            out.extend(g)
        else:
            out.append(f"B-{next(g)}")
            out.extend([f"I-{val}" for val in g])
    return ",".join(out)


df["labels"] = df["labels"].apply(fn)
print(df)

Prints:

                                   text                   labels
0              this is the good student            O,O,O,B-ADJ,O
1     she wears a beautiful green dress      O,O,O,B-ADJ,I-ADJ,O
2  he is from a friendly family of four  O,O,O,O,B-ADJ,O,O,B-NUM
3                    the house is empty                  O,O,O,O
4           the number four five is new      O,O,B-NUM,I-NUM,O,O
  • Related