I'm trying to make a data labeling in a table, and I need to do it in such a way that, in each row, the index is repeated, however, that in each column there is another Enum class.
What I've done so far is make this representation with the same enumerator class.
A solution using the column separately as a list would also be possible. But what would be the best way to resolve this?
import pandas as pd
from enum import Enum
df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})
df
class Tipos(Enum):
B = 1
I = 2
L = 3
for index, row in df.iterrows():
sentencas = row.values
for sentenca in sentencas:
for pos, palavra in enumerate(sentenca.split()):
print(f"{palavra} {Tipos(pos 1).name}")
Results:
first second
0 product and other product and prices
1 product2 and other price2
2 price product3 and price
product B
and I
other L
product B
and I
prices L
product2 B
and I
other L
price2 B
price B
product3 B
and I
price L
Desired Results:
Word Ent
0 product B_first
1 and I_first
2 other L_first
3 product B_second
4 and I_second
5 prices L_second
6 product2 B_first
7 and I_first
8 other L_first
9 price2 B_second
10 price B_first
11 product3 B_second
12 and I_second
13 price L_second
# In that case, the sequence is like that: (B_first, I_first, L_first, L_first...) and if changes the column gets B_second, I_second, L_second...
CodePudding user response:
Instead of using Enum
you can use a dict
mapping. You can avoid loops if you flatten your dataframe:
out = df.unstack().str.split().explode().sort_index(level=1).to_frame('Word')
out['Ent'] = out.groupby(level=[0, 1]).cumcount().map(Tipos) \
'_' out.index.get_level_values(0)
out = out.reset_index(drop=True)
Output:
>>> out
Word Ent
0 product B_first
1 and I_first
2 other L_first
3 product B_second
4 and I_second
5 prices L_second
6 product2 B_first
7 and I_first
8 other L_first
9 price2 B_second
10 price B_first
11 product3 B_second
12 and I_second
13 price L_second