How to convert each word of each row to numeric value of a dataframe-CodePudding

This dataframe is given to me.

My desired output using a dictionary is like this

**Given the following dictionary:-** 
d = {'I': 30,'am': 45,'good': 90,'boy': 50,'We':100,'are':70,'going':110}

How to do this using python .. I have tried like this but have failed :(

dataframe['new'] = data['documents'].apply(lambda x: dictionary[x])

Kindly help me out. Thanks in advance.

CodePudding user response：

You can use explode to get words then map with your dict and reshape your dataframe:

MAPPING = {'I': 30,'am': 45,'good': 90,'boy': 50,'We':100,'are':70,'going':110}

df['documents'] = (df['documents'].str.split().explode().map(MAPPING).astype(str)
                                  .groupby(level=0).agg(list).str.join(' '))
print(df)

# Output
   id    documents
0   0  30 45 90 50
1   1   100 70 110
2   2    30 45 110

Step by step

Phase 1: Explode

# Split phrase into words
>>> out = df['documents'].str.split()
0    [I, am, good, boy]
1      [We, are, going]
2        [I, am, going]
Name: documents, dtype: object

# Explode lists into scalar values
>>> out = out.explode()
0        I
0       am
0     good
0      boy
1       We
1      are
1    going
2        I
2       am
2    going
Name: documents, dtype: object

Phase 2: Transform

# Convert words with your dict mapping and convert as string
>>> out = out.map(MAPPING).astype(str)
0     30
0     45
0     90
0     50
1    100
1     70
1    110
2     30
2     45
2    110
Name: documents, dtype: object  # <- .astype(str)

Phase 3: Reshape

# Group by index (level=0) then aggregate to a list
>>> out = out.groupby(level=0).agg(list)
0    [30, 45, 90, 50]
1      [100, 70, 110]
2       [30, 45, 110]
Name: documents, dtype: object

# Join your list of words
>>> out = out.str.join(' ')
0    30 45 90 50
1     100 70 110
2      30 45 110
Name: documents, dtype: object