My dataframe:
df = pd.DataFrame({'a':['A', 'B'], 'b':[{5:1, 11:2}, {5:3}]})
Expected output (Each Key will be transformed to 'n' keys. Example row 1, key =5 (with value =2) get transformed to 5, 6. This change also need to reflect on 'a' column)
df_expected = pd.DataFrame({'a':['A1', 'A2', 'A1', 'A2', 'B1', 'B2', 'B3'], 'key':[5, 6, 11, 12, 5, 6, 7]})
My present state:
df['key']=df.apply(lambda x: x['b'].keys(), axis=1)
df['value']=df.apply(lambda x: max(x['b'].values()), axis=1)
df = df.loc[df.index.repeat(df.value)]
Stuck here. What should be next step?
Expected output:
df_expected = pd.DataFrame({'a':['A1', 'A2', 'A1', 'A2', 'B1', 'B2', 'B3'], 'key':[5, 6, 11, 12, 5, 6, 7]})
CodePudding user response:
This will do your transform, outside of pandas.
d = {'a':['A', 'B'], 'b':[{5:1, 11:2}, {5:3}]}
out = { 'a':[], 'b':[] }
for a,b in zip(d['a'],d['b']):
n = max(b.values())
for k in b:
for i in range(n):
out['a'].append(f'{a}{i 1}')
out['b'].append(k i)
print(out)
Output:
{'a': ['A1', 'A2', 'A1', 'A2', 'B1', 'B2', 'B3'], 'b': [5, 6, 11, 12, 5, 6, 7]}
CodePudding user response:
First you need to preprocess your input dictionary like this
import pandas as pd
d = {'a':['A', 'B'], 'b':[{5:2, 11:2}, {5:3}]} # Assuming 5:2 instead of 5:1.
res = {"a": [], "keys": []}
for idx, i in enumerate(d['b']):
res['a'].extend([f"{d['a'][idx]}{k}" for j in i for k in range(1,i[j] 1) ])
res['keys'].extend([k for j in i for k in range(j, j i[j])])
df = pd.DataFrame(res)
output
{'a': ['A1', 'A2', 'A1', 'A2', 'B1', 'B2', 'B3'], 'keys': [5, 6, 11, 12, 5, 6, 7]}
CodePudding user response:
For a pandas solution:
df2 = (df.drop(columns='b')
.join(pd.json_normalize(df['b'])
.rename_axis(columns='key')
.stack().reset_index(-1, name='repeat')
)
.loc[lambda d: d.index.repeat(d.pop('repeat'))]
)
g = df2.groupby(['a', 'key']).cumcount()
df2['a'] = g.add(1).astype(str)
df2['key'] = g
print(df2)
Output:
a key
0 A1 5
0 A1 11
0 A2 6
0 A2 12
0 A3 7
0 A3 13
1 B1 5
1 B2 6
1 B3 7