Locating column that correspond to a value in a dataframe-CodePudding

Suppose I have a define dataframe

 --- ---- ------- ------- -------- 
|ID |Fear |Happy |Angry  |Excited |              
 --- ----- ------ ------- -------- 
|   |     |      |       |        |
 --- ----- ------ ------- --------

I did emotional analysis on a text using NRCLex. Let say it returns

text_emotion = [Fear, Happy]

How do locate the values in the list and put it to the corresponding columns a 1 if exist and 0 if it doesn't?

 --- ---- ------- ------- -------- 
|ID |Fear |Happy |Angry  |Excited |              
 --- ----- ------ ------- -------- 
| A |1    |0     |0      |0       |
 --- ----- ------ ------- --------

I tried using get_dummies. But then it is not working on my situation given i want it to correspond to the defined dataframe. It gives me this instead:

 --- ---- ------- 
|ID |Fear |Happy |             
 --- ----- ------ 
| A | 1   | 1    | 
 --- ----- ------

I would appreicate any help. Thank You

CodePudding user response：

You could do the following:

frame = pd.DataFrame(columns = ["Fear", "Angry", "Happy", "Excited"])
mylist = ["Fear", "Happy"]
pattern = '|'.join(mylist)
row = frame.columns.str.contains(pattern).astype(int)
frame.loc[0] = row

If you have a whole list you can loop through and append each row to the dataframe using frame.loc[i]. Like so:

frame = pd.DataFrame(columns = ["Fear", "Angry", "Happy", "Excited"])
mylist = frame.columns
mylists = [["Fear", "Happy"], ["Angry", "Excited"]]
for i in range(len(mylists)):
    the_list = mylists[i]
    pattern = '|'.join(the_list)
    row = frame.columns.str.contains(pattern).astype(int)
    frame.loc[i] = row

CodePudding user response：

It depends on how you represent your data. Let's say you have the following dataframe constructed from your sentiment analysis results:

df = pd.DataFrame({
    'A':['Fear', 'Happy', 'Emotional'], 
    'B':['Excited', 'Emotional', 'Angry'], 
})

Then you could do:

df_dummies = pd.get_dummies(df.T, prefix=['']*len(df.T.columns), prefix_sep='')
out = df_dummies.groupby(level=0, axis=1).sum()

print(out):

   Angry  Emotional  Excited  Fear  Happy
A      0          1        0     1      1
B      1          1        1     0      0

If you want the index as separate ID then

out = out.rename_axis('ID').reset_index()

print(out):

  ID  Angry  Emotional  Excited  Fear  Happy
0  A      0          1        0     1      1
1  B      1          1        1     0      0

CodePudding user response：

I'd never heard of get_dummies() before, but here's what I had come up with. It also uses loc. It's nice because you could have a predefined or an undefined/empty dataframe and it'll still work.

Since the emotions in text_emotion are the same as the dataframe column names, you can just loop through text_emotion and make that dataframe row/column value equal to 1 with loc.

import numpy as np
import pandas as pd

df = pd.DataFrame()

text_emotion_1 = ['Fear', 'Happy', 'Angry']
text_emotion_2 = ['Happy', 'Excited']

# for row 0, or you can do boolean indexing to assign it
# to the row where index = A
for em in text_emotion_1:
    df.loc[0, em] = 1

# for row 1
for em in text_emotion_2:
    df.loc[1, em] = 1

If you started with an empty dataframe, you'd have nulls:

   Fear  Happy  Angry  Excited
0   1.0    1.0    1.0      NaN
1   NaN    1.0    NaN      1.0

So you could use fillna() and astype() to replace the nulls with 0's and convert everything to integer, respectively.

df.fillna(0, inplace=True)
df = df.astype('int')

Then your dataframe will look like this (just missing the index column):

   Fear  Happy  Angry  Excited
0     1      1      1        0
1     0      1      0        1

Edit: Removed a stray comma