This question seems so simple yet I'm having so much trouble, and haven't seen it asked anywhere. I have a column that contains a different list in each row, and all I want to do is create a new column based on if a specific value is in that list. Data looks like this:
Col1
[5,6,23,7,20,21]
[0,7,20,21]
[3,4,5,23,7,20,21]
[2,3,23,7,20,21]
[3,4,5,23,7,20,21]
Each number corresponds to a specific value, so 0 = 'apple'
, 2 = 'grape'
, etc...
While there are multiple values in each list, I'm really only looking for certain values, specifically 0, 2, 4, 6, 16, 17
So what I want to do is add a new column, with the value that corresponds to the number that's found within Col1
.
This is what the solution should be:
Col1 Col2
[5,6,23,7,20,21] Pear
[0,7,20,21] Apple
[3,4,5,23,7,20,21] Watermelon
[2,3,23,7,20,21] Grape
[16,20,21] Pineapple
I have tried:
df['Col2'] = np.where(0 in df['Col1'], 'Apple',
np.where(2 in df['Col1'], 'Grape',
np.where(4 in df['Col1'], 'Watermelon', )
And so on... But this defaults all values to Apple
Col1 Col2
[5,6,23,7,20,21] Apple
[0,7,20,21] Apple
[3,4,5,23,7,20,21] Apple
[2,3,23,7,20,21] Apple
[16,20,21] Apple
I was able to successfully do it by putting the above in a for
loop, but I am getting issues. Code:
df['Col2'] = ''
for i in range(0,df.shape[0]):
df['Col2'] = np.where(0 in df['Col1'], 'Apple',
np.where(2 in df['Col1'], 'Grape',
np.where(4 in df['Col1'], 'Watermelon', )
I get the result I am looking for, but I am being met with a warning:
<ipython-input-638-5dfd74b69688>:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
I assume the warning is because I have already created the blank column, but the only reason I did this is because I would get an error if I didn't create it. Furthermore, when I attempt to perform a simple df['Col2'].value_counts()
, I get an error: TypeError: unhashable type: 'numpy.ndarray'
. The result from value_counts()
still shows up even though I get this error, which is odd.
I am not entirely sure how else to proceed, I've tried a bunch of other things to create this column but none have been able to work. Any advice appreciated!
CodePudding user response:
Use explode
:
d = {0: 'Apple', 2: 'Grape', 4: 'Watermelon', 6: 'Banana', 16: 'Pear', 17: 'Orange'}
df['Col2'] = df['Col1'].explode().map(d).dropna().groupby(level=0).apply(', '.join)
print(df)
# Output:
Col1 Col2
0 [5, 6, 23, 7, 20, 21] Banana
1 [0, 7, 20, 21] Apple
2 [3, 4, 5, 23, 7, 20, 21] Watermelon
3 [2, 3, 23, 7, 20, 21] Grape
4 [3, 4, 5, 23, 7, 20, 21] Watermelon