Create a pandas DataFrame where each cell is a set of strings-CodePudding

I am trying to create a DataFrame like so:

col_a	col_b
{'soln_a'}	{'soln_b'}

In case it helps, here are some of my failed attempts:

import pandas as pd

my_dict_a = {"col_a": set(["soln_a"]), "col_b": set("soln_b")}
df_0 = pd.DataFrame.from_dict(my_dict_a) # ValueError: All arrays must be of the same length

df_1 = pd.DataFrame.from_dict(my_dict_a, orient="index").T # splits 'soln_b' into individual letters

my_dict_b = {"col_a": ["soln_a"], "col_b": ["soln_b"]}

df_2 = pd.DataFrame(my_dict_b).apply(set) # TypeError: 'set' type is unordered

df_3 = pd.DataFrame.from_dict(my_dict_b, orient="index").T # creates DataFrame of lists

df_3.apply(set, axis=1) # combines into single set of {soln_a, soln_b}

What's the best way to do this?

CodePudding user response：

You just need to ensure your input data structure is formatted correctly.

The (default) dictionary -> DataFrame constructor, asks for the values in the dictionary be a collection of some type. You just need to make sure you have a collection of set objects, instead of having the key link directly to a set.

So, if I change my input dictionary to have a list of sets, then it works as expected.

import pandas as pd

my_dict = {
    "col_a": [{"soln_a"}, {"soln_c"}], 
    "col_b": [{"soln_b", "soln_d"}, {"soln_c"}]
}
df = pd.DataFrame.from_dict(my_dict)

print(df)
      col_a             col_b
0  {soln_a}  {soln_d, soln_b}
1  {soln_c}          {soln_c}

CodePudding user response：

You could apply a list comprehension on the columns:

my_dict_b = {"col_a": ["soln_a"], "col_b": ["soln_b"]}
df_2 = pd.DataFrame(my_dict_b)
df_2 = df_2.apply(lambda col: [set([x]) for x in col])

Output:

      col_a     col_b
0  {soln_a}  {soln_b}

CodePudding user response：

Why not something like this?

df = pd.DataFrame({
    'col_a': [set(['soln_a'])],
    'col_b': [set(['soln_b'])],
})

Output:

>>> df
      col_a     col_b
0  {soln_a}  {soln_b}