Suppose I have a data frame that has the following elements say:
Element
0 a_1
1 a_2
2 b_1
3 a_3
4 b_2
.....
and so on.
Now suppose I have two categories A
and B
. Every element falls into one of these categories, and let's say I have lists As = [a_1, a_2, ...]
and Bs = [b_1, b_2, ...]
What I want to do is add a column Category
to df:
Element Category
0 a_1 A
1 a_2 A
2 b_1 B
3 a_3 A
4 b_2 B
.....
That is, we will query each row of the df, check if element is in one of these lists and the value of the new column will be the list it's in. Each element will be in one of these lists.
How would I go about doing this?
I've considered making via for loops a new array for the new column by checking each row but I feel like there should be a sleeker more pythonic way to do this.
CodePudding user response:
Rather than lists, use a dictionary and reverse it to use with map
:
d = {'A': ['a_1', 'a_2', 'a_3'],
'B': ['b_1', 'b_2'],
}
d2 = {k: v for v, l in d.items() for k in l}
df['Category'] = df['Element'].map(d2)
output:
Element Category
0 a_1 A
1 a_2 A
2 b_1 B
3 a_3 A
4 b_2 B
CodePudding user response:
Using np.where and numpy.in1d
- np.where --Return elements chosen from x or y depending on condition.
- numpy.in1d -- Test whether each element of a 1-D array is also present in a second array.
Code
# Add column Category by Assigning 'A' if the element in list A else assign 'B'
df['Category'] = np.where(np.in1d(df['Element'], A), 'A', 'B')
where:
A = ['a_1', 'a_2', 'a_3']
# B not needed since "Every element falls into one of these categories" (i.e. in B if not in A)