Home > Software engineering >  Convert python list into column headers and perform count
Convert python list into column headers and perform count

Time:10-04

I have currently a dataframe which looks like this and I have done a df["Keywords"] = df["Keywords"].str.split(" ")

ID | Keywords
1 | [agile]
2 | [python, python, python] 
3 | [agile, agile]

What I would like to achieve is something like this in which I tranpose the values into column headers and do a count. I tried to expand=true but that didnt work. Is there anyway I can achieve this?

ID | agile | python
1 | 1 | 0
2 | 0 | 1
3 | 1 | 0

CodePudding user response:

Don't split.. you can directly encode the values using get_dummies

df[['ID']].join(df['Keywords'].str.get_dummies(sep=' '))

CodePudding user response:

A long but different way is here. First, let's find the keyword distribution in the keyword column for each row.

def list_to_dict(row):
    freq = {}
    for item in row:
        if (item in freq):
            freq[item]  = 1
        else:
            freq[item] = 1
 
    for key, value in freq.items():
        return freq

df['dicts']=df['keywords'].apply(lambda x: list_to_dict(x))
print(df)
'''
id  keywords                         dicts
1   ['agile']                        {'agile': 1}
2   ['python', 'python', 'python']   {'python': 3}
3   ['agile', 'agile']               {'agile': 2}
'''

then split dictionaries to new columns:

df = df.join(pd.json_normalize(df.pop('dicts'))).fillna(0)
float_cols=list(df.select_dtypes(include=['float']).columns)
for i in float_cols:
    df[i]=df[i].astype(int)
print(df)
'''
id                     keywords  agile  python
1                      [agile]   1      0
2     [python, python, python]   0      3
3               [agile, agile]   2      0
'''
  • Related