I have currently a dataframe which looks like this and I have done a df["Keywords"] = df["Keywords"].str.split(" ")
ID | Keywords
1 | [agile]
2 | [python, python, python]
3 | [agile, agile]
What I would like to achieve is something like this in which I tranpose the values into column headers and do a count. I tried to expand=true but that didnt work. Is there anyway I can achieve this?
ID | agile | python
1 | 1 | 0
2 | 0 | 1
3 | 1 | 0
CodePudding user response:
Don't split
.. you can directly encode the values using get_dummies
df[['ID']].join(df['Keywords'].str.get_dummies(sep=' '))
CodePudding user response:
A long but different way is here. First, let's find the keyword distribution in the keyword column for each row.
def list_to_dict(row):
freq = {}
for item in row:
if (item in freq):
freq[item] = 1
else:
freq[item] = 1
for key, value in freq.items():
return freq
df['dicts']=df['keywords'].apply(lambda x: list_to_dict(x))
print(df)
'''
id keywords dicts
1 ['agile'] {'agile': 1}
2 ['python', 'python', 'python'] {'python': 3}
3 ['agile', 'agile'] {'agile': 2}
'''
then split dictionaries to new columns:
df = df.join(pd.json_normalize(df.pop('dicts'))).fillna(0)
float_cols=list(df.select_dtypes(include=['float']).columns)
for i in float_cols:
df[i]=df[i].astype(int)
print(df)
'''
id keywords agile python
1 [agile] 1 0
2 [python, python, python] 0 3
3 [agile, agile] 2 0
'''