Home > Software engineering >  How can I do this split process in Python?
How can I do this split process in Python?


I'm trying to make a data labeling in a table, and I need to do it in such a way that, in each row, the index is repeated, however, that in each column there is another Enum class.

What I've done so far is make this representation with the same enumerator class.

A solution using the column separately as a list would also be possible. But what would be the best way to resolve this?

import pandas as pd
from enum import Enum

df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})

class Tipos(Enum):
    B = 1
    I = 2
    L = 3

for index, row in df.iterrows():
    sentencas = row.values
    for sentenca in sentencas:
        for pos, palavra in enumerate(sentenca.split()):
            print(f"{palavra} {Tipos(pos 1).name}")


                first              second
0   product and other  product and prices
1  product2 and other              price2
2               price  product3 and price

product B
and I
other L
product B
and I
prices L
product2 B
and I
other L
price2 B
price B
product3 B
and I
price L

Desired Results:

        Word Ent
0    product B_first
1        and I_first
2      other L_first
3    product B_second
4        and I_second
5     prices L_second
6   product2 B_first
7        and I_first
8      other L_first
9     price2 B_second
10     price B_first
11  product3 B_second
12       and I_second
13     price L_second

# In that case, the sequence is like that: (B_first, I_first, L_first, L_first...) and if changes the column gets B_second, I_second, L_second...

CodePudding user response:

Instead of using Enum you can use a dict mapping. You can avoid loops if you flatten your dataframe:

out = df.unstack().str.split().explode().sort_index(level=1).to_frame('Word')
out['Ent'] = out.groupby(level=[0, 1]).cumcount().map(Tipos) \
                   '_'   out.index.get_level_values(0)
out = out.reset_index(drop=True)


>>> out
        Word       Ent
0    product   B_first
1        and   I_first
2      other   L_first
3    product  B_second
4        and  I_second
5     prices  L_second
6   product2   B_first
7        and   I_first
8      other   L_first
9     price2  B_second
10     price   B_first
11  product3  B_second
12       and  I_second
13     price  L_second
  • Related