I have a dataframe in IOB format as below:-
Name | Label |
---|---|
Alan | B-PERSON |
Smith | I-PERSON |
is | O |
Alice's | B-PERSON |
uncle | O |
from | O |
New | B-LOCATION |
York | I-LOCATION |
city | I-LOCATION |
I would like to convert into a new dataframe as below:-
Name | Label |
---|---|
Alan Smith | PERSON |
Alice's | PERSON |
New York city | LOCATION |
Any help is much appreciated!
CodePudding user response:
You can create groups by compare values O
, remove IO-
values in Label
column and with helper groups created by cumulative sum aggregate join
:
m = df['Label'].eq('O')
df = (df[~m].assign(Label=lambda x: x['Label'].str.replace('^[IB]-', ''))
.groupby([m.cumsum(), 'Label'])['Name']
.agg(' '.join)
.droplevel(0)
.reset_index()
.reindex(df.columns, axis=1))
print (df)
Name Label
0 Alan Smith PERSON
1 Alice's PERSON
2 New York city LOCATION