Home > Net >  Pandas union of 3 columns of text
Pandas union of 3 columns of text

Time:06-14

I want to create a union of 3 columns from a dataframe. The 3 columns are of type object.

A B C
Cat Dog Monkey
Dog Horse Cat

I want a union of columns A,B,C and I am expecting this result -

List A = [Cat,Dog,Horse,Monkey]

My naive approach:

df['union'] = df.apply(lambda x: x['A'].union(x['B']), axis=1)

This is the error I get:

AttributeError: 'str' object has no attribute 'union'

Please tell me how to get this result.

CodePudding user response:

What you probably want is a set, which you can get by using:

set(df.to_numpy().ravel())  # {'Cat', 'Dog', 'Horse', 'Monkey'}

Python sets support operations like union with another set.

To compare the suggested solutions, here are the timings using timeit on my machine, ordered by speed:

set(df.to_numpy().flatten())           # 5.78 µs ± 682 ns per loop

set(df.to_numpy().ravel())             # 5.93 µs ± 620 ns per loop

np.unique(df.values.ravel()).tolist()  # 14.3 µs ± 1.79 µs per loop

df.stack().unique().tolist()           # 517 µs ± 124 µs per loop

CodePudding user response:

Or you can do : np.unique(df.values.ravel()).tolist()

  • Related