I want to create a union of 3 columns from a dataframe. The 3 columns are of type object.
A | B | C |
---|---|---|
Cat | Dog | Monkey |
Dog | Horse | Cat |
I want a union of columns A,B,C and I am expecting this result -
List A = [Cat,Dog,Horse,Monkey]
My naive approach:
df['union'] = df.apply(lambda x: x['A'].union(x['B']), axis=1)
This is the error I get:
AttributeError: 'str' object has no attribute 'union'
Please tell me how to get this result.
CodePudding user response:
What you probably want is a set
, which you can get by using:
set(df.to_numpy().ravel()) # {'Cat', 'Dog', 'Horse', 'Monkey'}
Python sets support operations like union
with another set.
To compare the suggested solutions, here are the timings using timeit
on my machine, ordered by speed:
set(df.to_numpy().flatten()) # 5.78 µs ± 682 ns per loop
set(df.to_numpy().ravel()) # 5.93 µs ± 620 ns per loop
np.unique(df.values.ravel()).tolist() # 14.3 µs ± 1.79 µs per loop
df.stack().unique().tolist() # 517 µs ± 124 µs per loop
CodePudding user response:
Or you can do : np.unique(df.values.ravel()).tolist()