Home > Software engineering >  pandas replace values of a list column
pandas replace values of a list column

Time:12-23

I have a dataframe like this

ID Feeback
T223 [Good, Bad, Bad]
T334 [Average,Good,Good]
feedback_dict = {'Good':1, 'Average':2, 'Bad':3}

using this dictionary I have to replace Feedback column

ID Feeback
T223 [1, 3, 3]
T334 [2,1,1]

I tried two way, but none worked, any help will be appreciated.

method1:    
df = df.assign(Feedback=[feedback_dict.get(i,i)  for i in list(df['Feedback'])])

method2:
df['Feedback'] = df['Feedback'].apply(lambda x : [feedback_dict.get(i,i)  for i in list(x)])

CodePudding user response:

For me second solution working, but necessary convert strings to lists before:

import ast

df['Feedback'] = df['Feedback'].apply(ast.literal_eval)
#df['Feedback'] = df['Feedback'].str.strip('[]').str.split(',')

First solution working with nested dictionary:

df = df.assign(Feedback=[[feedback_dict.get(i,i) for i in x] for x in df['Feedback']])


df['Feedback'] = df['Feedback'].apply(lambda x : [feedback_dict.get(i,i)  for i in list(x)])
print (df)
     ID    Feedback
0  T223  [1, 3, 3]
1  T334  [2, 1, 1]

EDIT: If instead lists are missing values use if-else statement - non list values are replaced to empty lists:

print (df)
     ID             Feedback
0  T223       [Good,Bad,Bad]
1  T334  [Average,Good,Good]
2   NaN                  NaN


feedback_dict = {'Good':1, 'Average':2, 'Bad':3}
df = df.assign(Feedback=[[feedback_dict.get(i,i) for i in x] if isinstance(x, list) else [] 
                            for x in df['Feedback']])

print (df)
     ID   Feedback
0  T223  [1, 3, 3]
1  T334  [2, 1, 1]
2   NaN         []

CodePudding user response:

If your usecase is about as simple as this example, I wouldn't recommend this method. However, here's another option in case it makes other parts of your project easier.

  1. df.explode() your column (assuming it is a list and not text; otherwise convert it to a list first)
  2. Perform the replacements with df.replace()
  3. Group the rows back together again with df.groupby() and df.agg()

For the example, it would look like this (assuming the variables have been declared like in your question):

df = df.explode('Feedback')
df['Feedback'] = df['Feedback'].replace(feedback_dict)
df = df.groupby('ID').agg(list)

CodePudding user response:

l , L = [] , []  # two list for adding new values into them

for lst in df.Feeback: # call the lists in the Feeback column
    for i in last: #calling each element in each lists
        if i == 'Good': #if the element is Good then:
            l.append(feedback_dict['Good'])   #append the value 1 to the first created list
        if i == 'Average':  #if the element is Average then:
            l.append(feedback_dict['Average'])  #append the value 2 to the first created list
        if i == 'Bad':   #if the element is Bad then:
            l.append(feedback_dict['Bad']) #append the value 3 to the first created list
L.append(l[:3])  # we need to split half of the first list to add as a list to the second list and the other half as another list to the second list we created
L.append(l[3:])
df['Feeback'] = L  #at the end just put the values of the second created list as feedback column
  • Related