I have a dataframe like this
import pandas as pd
df = pd.DataFrame(
{
'C1': list('aabbab'),
'C2': list('abbbaa'),
'value': range(11, 17)
}
)
C1 C2 value
0 a a 11
1 a b 12
2 b b 13
3 b b 14
4 a a 15
5 b a 16
and I would like to generate a dictionary like this:
{'C1': {'a': {1: 11, 2: 12, 3: 15}, 'b': {1: 13, 2: 14, 3: 16}},
'C2': {'a': {1: 11, 2: 15, 3: 16}, 'b': {1: 12, 2: 13, 3: 14}}}
Logic is as follows:
In df
I go to the column C1
and the first a
I find in the column corresponds to value 11
, the second one to value 12
and the third one to 15
. The position of the a
and the corresponding value should be stored in the dictionary for the keys C1
and a
.
I could do something like this
df_ss = df.loc[df['C1'] == 'a', 'value']
d = {ind: val for ind, val in enumerate(df_ss.values, 1)}
which yields for d
:
{1: 11, 2: 12, 3: 15}
which is indeed the desired output. I could then put this into a loop and generate all required dictionaries.
Does anyone sees something more efficient than this?
CodePudding user response:
You could use a groupby and a nested dict comprehension:
import pandas as pd
df = pd.DataFrame(
{
'C1': list('aabbab'),
'C2': list('abbbaa'),
'value': range(11, 17)
}
)
d = {
c: {k: dict(enumerate(g["value"], 1)) for k, g in df.groupby(c)}
for c in ["C1", "C2"]
}
Which outputs:
{'C1': {'a': {1: 11, 2: 12, 3: 15}, 'b': {1: 13, 2: 14, 3: 16}},
'C2': {'a': {1: 11, 2: 15, 3: 16}, 'b': {1: 12, 2: 13, 3: 14}}}
CodePudding user response:
Use a dictionary comprehension on the groupby
for each C-value with enumerate
:
cols = ['C1', 'C2']
# or, programmatically
# cols = df.filter(regex='^C').columns
out = {c: {k: dict(enumerate(g, start=1)) for k,g in df.groupby(c)['value']}
for c in cols}
output:
{'C1': {'a': {1: 11, 2: 12, 3: 15}, 'b': {1: 13, 2: 14, 3: 16}},
'C2': {'a': {1: 11, 2: 15, 3: 16}, 'b': {1: 12, 2: 13, 3: 14}}}
CodePudding user response:
This is probably a bit slower than @mozway and @Alex methods:
{
c: df.set_index([df.groupby(c).cumcount() 1, c])["value"]
.unstack(0)
.to_dict("index")
for c in ["C1", "C2"]
}
Output:
{'C1': {'a': {1: 11, 2: 12, 3: 15}, 'b': {1: 13, 2: 14, 3: 16}},
'C2': {'a': {1: 11, 2: 15, 3: 16}, 'b': {1: 12, 2: 13, 3: 14}}}