Suppose I have a define dataframe
--- ---- ------- ------- --------
|ID |Fear |Happy |Angry |Excited |
--- ----- ------ ------- --------
| | | | | |
--- ----- ------ ------- --------
I did emotional analysis on a text using NRCLex. Let say it returns
text_emotion = [Fear, Happy]
How do locate the values in the list and put it to the corresponding columns a 1 if exist and 0 if it doesn't?
--- ---- ------- ------- --------
|ID |Fear |Happy |Angry |Excited |
--- ----- ------ ------- --------
| A |1 |0 |0 |0 |
--- ----- ------ ------- --------
I tried using get_dummies. But then it is not working on my situation given i want it to correspond to the defined dataframe. It gives me this instead:
--- ---- -------
|ID |Fear |Happy |
--- ----- ------
| A | 1 | 1 |
--- ----- ------
I would appreicate any help. Thank You
CodePudding user response:
You could do the following:
frame = pd.DataFrame(columns = ["Fear", "Angry", "Happy", "Excited"])
mylist = ["Fear", "Happy"]
pattern = '|'.join(mylist)
row = frame.columns.str.contains(pattern).astype(int)
frame.loc[0] = row
If you have a whole list you can loop through and append each row to the dataframe using frame.loc[i]
. Like so:
frame = pd.DataFrame(columns = ["Fear", "Angry", "Happy", "Excited"])
mylist = frame.columns
mylists = [["Fear", "Happy"], ["Angry", "Excited"]]
for i in range(len(mylists)):
the_list = mylists[i]
pattern = '|'.join(the_list)
row = frame.columns.str.contains(pattern).astype(int)
frame.loc[i] = row
CodePudding user response:
It depends on how you represent your data. Let's say you have the following dataframe constructed from your sentiment analysis results:
df = pd.DataFrame({
'A':['Fear', 'Happy', 'Emotional'],
'B':['Excited', 'Emotional', 'Angry'],
})
Then you could do:
df_dummies = pd.get_dummies(df.T, prefix=['']*len(df.T.columns), prefix_sep='')
out = df_dummies.groupby(level=0, axis=1).sum()
print(out):
Angry Emotional Excited Fear Happy
A 0 1 0 1 1
B 1 1 1 0 0
If you want the index as separate ID then
out = out.rename_axis('ID').reset_index()
print(out):
ID Angry Emotional Excited Fear Happy
0 A 0 1 0 1 1
1 B 1 1 1 0 0
CodePudding user response:
I'd never heard of get_dummies()
before, but here's what I had come up with. It also uses loc
. It's nice because you could have a predefined or an undefined/empty dataframe and it'll still work.
Since the emotions in text_emotion
are the same as the dataframe column names, you can just loop through text_emotion
and make that dataframe row/column value equal to 1 with loc
.
import numpy as np
import pandas as pd
df = pd.DataFrame()
text_emotion_1 = ['Fear', 'Happy', 'Angry']
text_emotion_2 = ['Happy', 'Excited']
# for row 0, or you can do boolean indexing to assign it
# to the row where index = A
for em in text_emotion_1:
df.loc[0, em] = 1
# for row 1
for em in text_emotion_2:
df.loc[1, em] = 1
If you started with an empty dataframe, you'd have nulls:
Fear Happy Angry Excited
0 1.0 1.0 1.0 NaN
1 NaN 1.0 NaN 1.0
So you could use fillna()
and astype()
to replace the nulls with 0's and convert everything to integer, respectively.
df.fillna(0, inplace=True)
df = df.astype('int')
Then your dataframe will look like this (just missing the index column):
Fear Happy Angry Excited
0 1 1 1 0
1 0 1 0 1
Edit: Removed a stray comma