Home > database >  Convert pandas dataframe to custom dictionary
Convert pandas dataframe to custom dictionary

Time:11-17

I have a dataframe that I need to convert to a dictionary in a specific format. An example of the dataframe data (not real data for ease of presentation) is shown below:

enter image description here

Here is csv of the input:

col1,col2,col3,col3    
abc,1,2,3    
abc,4,5,6    
bcd,10,11,12    
bcd,33,22,11    
cde,3,43,23    
cde,43,23,56

The resulting dictionary should be in the following format:

{ abc: [[abc, 1, 2, 3], [abc, 4, 5, 6]],  bcd: [[bcd, 10, 11, 12], [bcd, 33, 22, 11]],  cde: [[cde, 3, 43, 23], [cde, 43, 23, 56]] }

Is there an efficient (time to process) way to do such conversion?

CodePudding user response:

Sounds like

{k: v.to_numpy() for k, v in df.groupby(df.col1)}

does what you want; outputs

>>> {k: v.to_numpy() for k, v in df.groupby(df.col1)}
{'abc': array([['abc', 1, 2, 3],    ['abc', 4, 5, 6]], dtype=object),
 'bcd': array([['bcd', 10, 11, 12], ['bcd', 33, 22, 11]], dtype=object), 
 'cde': array([['cde', 3, 43, 23],  ['cde', 43, 23, 56]], dtype=object)
}

This has the advantage of being quite fast, so long as the numpy ndarrays are suitable for further use in your case.

CodePudding user response:

You could do it like this:

from collections import defaultdict

df = pd.read_csv('test.csv')
dct = defaultdict(list)
for idx, row in df.iterrows():
  values = list(row.values)
  col1 = values[0]
  dct[col1].append(values)

Output:

defaultdict(list,
            {'abc': [['abc', 1, 2, 3], ['abc', 4, 5, 6]],
             'bcd': [['bcd', 10, 11, 12], ['bcd', 33, 22, 11]],
             'cde': [['cde', 3, 43, 23], ['cde', 43, 23, 56]]})
  • Related