Home > Enterprise >  creating dictionaries from values in pandas columns with repeating values
creating dictionaries from values in pandas columns with repeating values

Time:05-07

Considering this sample dataframe:

   location emp
0   fac_1   emp1
1   fac_2   emp2
2   fac_2   emp3
3   fac_3   emp4
4   fac_4   emp5

It can be recreated by this code:

 sample_dict = {'location':['fac_1', 'fac_2', 'fac_2', 'fac_3', 'fac_4'], 'emp':['emp1', 'emp2', 'emp3', 'emp4', 'emp5']}
data = pd.DataFrame(sample_dict)

I am trying to create a dictionary with values of every unique "fac" and all associated emp values. Some of the rows there is only one fac and one emp, so no big deal there. In many rows, as in the example, there are two different emp values for one fac value. (See fac_2 in the sample dataframe).

The desired output is a single dictionary like this:

{fac_1:emp1, fac_2:[emp2, emp3], fac_3:emp4, fac_4:emp5}

I was able to achieve this but via a convoluted route that I am certain, someone here has a more efficient, less error prone way to achieve. In short, I grouped my df by fac and then used iloc to grab the fac name and the .to_list() function to get all the emp values in a list and then combined that into a dictionary.

Something like this:

location = data.iloc[0,1]
emp_lst= data['emp'].to_list()
fac_dict = {facility:np_lst}

Someone with some skill at making dictionaries from pandas dataframes I am sure has a more efficient way to achieve the desired output. Thanks for taking a look!

CodePudding user response:

data.groupby('location')['emp'].agg(list).to_dict()

Output:

{'fac_1': ['emp1'], 'fac_2': ['emp2', 'emp3'], 'fac_3': ['emp4'], 'fac_4': ['emp5']}

CodePudding user response:

You can try

out = data.groupby('location')['emp'].agg(lambda col: list(col) if len(col) >= 2 else col).to_dict()
print(out)

{'fac_1': 'emp1', 'fac_2': ['emp2', 'emp3'], 'fac_3': 'emp4', 'fac_4': 'emp5'}
  • Related