This dataframe is given to me.
My desired output using a dictionary is like this
**Given the following dictionary:-**
d = {'I': 30,'am': 45,'good': 90,'boy': 50,'We':100,'are':70,'going':110}
How to do this using python .. I have tried like this but have failed :(
dataframe['new'] = data['documents'].apply(lambda x: dictionary[x])
Kindly help me out. Thanks in advance.
CodePudding user response:
You can use explode
to get words then map with your dict and reshape your dataframe:
MAPPING = {'I': 30,'am': 45,'good': 90,'boy': 50,'We':100,'are':70,'going':110}
df['documents'] = (df['documents'].str.split().explode().map(MAPPING).astype(str)
.groupby(level=0).agg(list).str.join(' '))
print(df)
# Output
id documents
0 0 30 45 90 50
1 1 100 70 110
2 2 30 45 110
Step by step
Phase 1: Explode
# Split phrase into words
>>> out = df['documents'].str.split()
0 [I, am, good, boy]
1 [We, are, going]
2 [I, am, going]
Name: documents, dtype: object
# Explode lists into scalar values
>>> out = out.explode()
0 I
0 am
0 good
0 boy
1 We
1 are
1 going
2 I
2 am
2 going
Name: documents, dtype: object
Phase 2: Transform
# Convert words with your dict mapping and convert as string
>>> out = out.map(MAPPING).astype(str)
0 30
0 45
0 90
0 50
1 100
1 70
1 110
2 30
2 45
2 110
Name: documents, dtype: object # <- .astype(str)
Phase 3: Reshape
# Group by index (level=0) then aggregate to a list
>>> out = out.groupby(level=0).agg(list)
0 [30, 45, 90, 50]
1 [100, 70, 110]
2 [30, 45, 110]
Name: documents, dtype: object
# Join your list of words
>>> out = out.str.join(' ')
0 30 45 90 50
1 100 70 110
2 30 45 110
Name: documents, dtype: object