Hello_world!
I have a DataFrame like this:
from pandas import DataFrame
df = DataFrame({"A": ['sd', 'df', 'gh', 'rv'],
"B": ['hj', '4r', 'tg', '2s'],
"C": ['hf', 'qw', 'e4', '7u'],
"D": ['1q', 'nc', 'xf', '7y'],
"E": ['9i', 'g7', 'ce', 'x3']})
or
A B C D E
0 sd hj hf 1q 9i
1 df 4r qw nc g7
2 gh tg e4 xf ce
3 rv 2s 7u 7y x3
I need to create a new column that will contain values of the set type, consisting of the values of the first five columns.
Expected result is:
A B C D E F
0 sd hj hf 1q 9i {'sd','hj', 'hf', '1q', '9i'}
1 df 4r qw nc g7 {'df','4r', 'qw', 'nc', 'g7'}
2 gh tg e4 xf ce {'gh','tg', 'e4', 'xf', 'ce'}
3 rv 2s 7u 7y x3 {'rv','2s', '7u', '7y', 'x3'}
print(type(df.loc[0, 'F'])) # <class 'set'>
print(type(df.loc[0, 'A'])) # <class 'str'>
My code:
from pandas import DataFrame
df = DataFrame({"A": ['sd', 'df', 'gh', 'rv'],
"B": ['hj', '4r', 'tg', '2s'],
"C": ['hf', 'qw', 'e4', '7u'],
"D": ['1q', 'nc', 'xf', '7y'],
"E": ['9i', 'g7', 'ce', 'x3']})
f = {df.loc[0, 'A'], df.loc[0, 'B'], df.loc[0, 'C'], df.loc[0, 'D'], df.loc[0, 'E']}
df = df.assign(F = f)
print(df)
...have ValueError: Length of values (5) does not match length of index (4).
If I rewrite the code so that the length of the values matches the length of the index:
from pandas import DataFrame
df = DataFrame({"A": ['sd', 'df', 'gh', 'rv'],
"B": ['hj', '4r', 'tg', '2s'],
"C": ['hf', 'qw', 'e4', '7u'],
"D": ['1q', 'nc', 'xf', '7y'],
"E": ['9i', 'g7', 'ce', 'x3']})
f = {df.loc[0, 'A'], df.loc[0, 'B'], df.loc[0, 'C'], df.loc[0, 'D']}
df = df.assign(F = f)
print(df)
...I have TypeError: 'set' type is unordered.
I ask the respected community for help to solve my problem.
CodePudding user response:
Simply use:
df['F'] = df.apply(set, axis=1)
Note however that you have no control over the displayed order of sets as they are unordered containers.
Output:
A B C D E F
0 sd hj hf 1q 9i {hj, 1q, 9i, sd, hf}
1 df 4r qw nc g7 {nc, df, qw, g7, 4r}
2 gh tg e4 xf ce {e4, xf, gh, ce, tg}
3 rv 2s 7u 7y x3 {7u, x3, rv, 2s, 7y}
CodePudding user response:
Use List comprehension
for better performance:
In [1069]: df['F'] = [set(i) for i in df.values]
In [1070]: df
Out[1070]:
A B C D E F
0 sd hj hf 1q 9i {sd, 1q, hf, 9i, hj}
1 df 4r qw nc g7 {g7, 4r, nc, df, qw}
2 gh tg e4 xf ce {xf, gh, ce, e4, tg}
3 rv 2s 7u 7y x3 {2s, x3, 7y, rv, 7u}
OR as suggested by @jezrael:
df['F'] = [set(i) for i in df.to_numpy()]
Performance timings:
@mozway's solution:
In [1078]: %timeit df.apply(set, axis=1)
395 µs ± 24.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
My solution:
In [1079]: %timeit [set(i) for i in df.values]
7.3 µs ± 31.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)