Home > Software engineering >  Create new column based on values in LIST in other column
Create new column based on values in LIST in other column

Time:12-15

This question seems so simple yet I'm having so much trouble, and haven't seen it asked anywhere. I have a column that contains a different list in each row, and all I want to do is create a new column based on if a specific value is in that list. Data looks like this:

Col1
[5,6,23,7,20,21]    
[0,7,20,21]
[3,4,5,23,7,20,21]
[2,3,23,7,20,21]
[3,4,5,23,7,20,21]

Each number corresponds to a specific value, so 0 = 'apple', 2 = 'grape', etc...

While there are multiple values in each list, I'm really only looking for certain values, specifically 0, 2, 4, 6, 16, 17

So what I want to do is add a new column, with the value that corresponds to the number that's found within Col1.

This is what the solution should be:

Col1               Col2
[5,6,23,7,20,21]   Pear
[0,7,20,21]        Apple
[3,4,5,23,7,20,21] Watermelon
[2,3,23,7,20,21]   Grape
[16,20,21]         Pineapple

I have tried:

df['Col2'] = np.where(0 in df['Col1'], 'Apple',
                np.where(2 in df['Col1'], 'Grape', 
                   np.where(4 in df['Col1'], 'Watermelon', )

And so on... But this defaults all values to Apple

Col1               Col2
[5,6,23,7,20,21]   Apple
[0,7,20,21]        Apple
[3,4,5,23,7,20,21] Apple
[2,3,23,7,20,21]   Apple
[16,20,21]         Apple

I was able to successfully do it by putting the above in a for loop, but I am getting issues. Code:

df['Col2'] = ''
for i in range(0,df.shape[0]):
   df['Col2'] = np.where(0 in df['Col1'], 'Apple',
                   np.where(2 in df['Col1'], 'Grape', 
                      np.where(4 in df['Col1'], 'Watermelon', )

I get the result I am looking for, but I am being met with a warning:

<ipython-input-638-5dfd74b69688>:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

I assume the warning is because I have already created the blank column, but the only reason I did this is because I would get an error if I didn't create it. Furthermore, when I attempt to perform a simple df['Col2'].value_counts(), I get an error: TypeError: unhashable type: 'numpy.ndarray'. The result from value_counts() still shows up even though I get this error, which is odd.

I am not entirely sure how else to proceed, I've tried a bunch of other things to create this column but none have been able to work. Any advice appreciated!

CodePudding user response:

Use explode:

d = {0: 'Apple', 2: 'Grape', 4: 'Watermelon', 6: 'Banana', 16: 'Pear', 17: 'Orange'}
df['Col2'] = df['Col1'].explode().map(d).dropna().groupby(level=0).apply(', '.join)
print(df)

# Output:
                       Col1        Col2
0     [5, 6, 23, 7, 20, 21]      Banana
1            [0, 7, 20, 21]       Apple
2  [3, 4, 5, 23, 7, 20, 21]  Watermelon
3     [2, 3, 23, 7, 20, 21]       Grape
4  [3, 4, 5, 23, 7, 20, 21]  Watermelon
  • Related