I have the following dataframe
# Import pandas library
import pandas as pd
import numpy as np
# initialize list elements
data = ['george',
'instagram',
'nick',
'basketball',
'tennis']
# Create the pandas DataFrame with column name is provided explicitly
df = pd.DataFrame(data, columns=['Unique Words'])
# print dataframe.
df
and I want to create a new column based on the following two lists that looks like this
key_words = ["football", "basketball", "tennis"]
usernames = ["instagram", "facebook", "snapchat"]
Label
-----
0
2
0
1
1
So the words that are in the list key_words take the label 1, in the list usernames take the label 2 and all the other the label 0.
Thank you so much for your time and help!
CodePudding user response:
One way to do this is to create a label map by numbering all of the elements in the first list as 1, and the other as 2. Then you can use .map
in pandas to map the values and fillna with 0.
# Import pandas library
import pandas as pd
import numpy as np
# initialize list elements
data = ['george',
'instagram',
'nick',
'basketball',
'tennis']
# Create the pandas DataFrame with column name is provided explicitly
df = pd.DataFrame(data, columns=['Unique Words'])
key_words = ["football", "basketball", "tennis"]
usernames = ["instagram", "facebook", "snapchat"]
label_map = {e: i 1 for i, l in enumerate([key_words,usernames]) for e in l}
print(label_map)
df['Label'] = df['Unique Words'].map(label_map).fillna(0).astype(int)
print(df)
Output
{'football': 1, 'basketball': 1, 'tennis': 1, 'instagram': 2, 'facebook': 2, 'snapchat': 2}
Unique Words Label
0 george 0
1 instagram 2
2 nick 0
3 basketball 1
4 tennis 1