Some weird transformation to pandas dataframe-CodePudding

My dataframe:

df = pd.DataFrame({'a':['A', 'B'], 'b':[{5:1, 11:2}, {5:3}]})

Expected output (Each Key will be transformed to 'n' keys. Example row 1, key =5 (with value =2) get transformed to 5, 6. This change also need to reflect on 'a' column)

df_expected = pd.DataFrame({'a':['A1', 'A2', 'A1', 'A2', 'B1', 'B2', 'B3'], 'key':[5, 6, 11, 12, 5, 6, 7]})

My present state:

df['key']=df.apply(lambda x: x['b'].keys(), axis=1)
df['value']=df.apply(lambda x: max(x['b'].values()), axis=1)
df = df.loc[df.index.repeat(df.value)]

Stuck here. What should be next step?

Expected output:

df_expected = pd.DataFrame({'a':['A1', 'A2', 'A1', 'A2', 'B1', 'B2', 'B3'], 'key':[5, 6, 11, 12, 5, 6, 7]})

CodePudding user response：

This will do your transform, outside of pandas.

d = {'a':['A', 'B'], 'b':[{5:1, 11:2}, {5:3}]}

out = { 'a':[], 'b':[] }
for a,b in zip(d['a'],d['b']):
    n = max(b.values())
    for k in b:
        for i in range(n):
            out['a'].append(f'{a}{i 1}')
            out['b'].append(k i)
print(out)

Output:

{'a': ['A1', 'A2', 'A1', 'A2', 'B1', 'B2', 'B3'], 'b': [5, 6, 11, 12, 5, 6, 7]}

CodePudding user response：

First you need to preprocess your input dictionary like this

import pandas as pd
d = {'a':['A', 'B'], 'b':[{5:2, 11:2}, {5:3}]} # Assuming 5:2 instead of 5:1.
res = {"a": [], "keys": []}
for idx, i in enumerate(d['b']):
    res['a'].extend([f"{d['a'][idx]}{k}" for j in i for k in range(1,i[j] 1) ])
    res['keys'].extend([k for j in i for k in range(j, j i[j])])
df = pd.DataFrame(res)

output

{'a': ['A1', 'A2', 'A1', 'A2', 'B1', 'B2', 'B3'], 'keys': [5, 6, 11, 12, 5, 6, 7]}

CodePudding user response：

For a pandas solution:

df2 = (df.drop(columns='b')
       .join(pd.json_normalize(df['b'])
        .rename_axis(columns='key')
        .stack().reset_index(-1, name='repeat')
       )
   .loc[lambda d: d.index.repeat(d.pop('repeat'))]
 )

g = df2.groupby(['a', 'key']).cumcount()

df2['a']  = g.add(1).astype(str)
df2['key']  = g

print(df2)

Output: