Df1
A B C1 C2 D E
a1 b1 2 4 d1 e1
a2 b2 1 2 d2 e2
Df2
A B C D E
a1 b1 2 d1 e1
a1 b1 3 d1 e1
a1 b1 4 d1 e1
a2 b2 1 d2 e2
a2 b2 2 d2 e2
How to make Df2 from Df1 in the fastest possible way? I tried using groupby and then within for loop used np.arange to fill Df2.C and then used pd.concat to make the final Df2. But this approach is very slow and doesn't seem very elegant and pythonic as well. Can somebody please help with this problem.
CodePudding user response:
One way is to melt
df1
, use groupby.apply
to add ranges; then explode
for the final output:
cols = ['A','B','D','E']
out = (df1.melt(cols, value_name='C').groupby(cols)['C']
.apply(lambda x: range(x.min(), x.max() 1))
.explode().reset_index(name='C'))
Output:
A B D E C
0 a1 b1 d1 e1 2
1 a1 b1 d1 e1 3
2 a1 b1 d1 e1 4
3 a2 b2 d2 e2 1
4 a2 b2 d2 e2 2
CodePudding user response:
Try this:
df1.assign(C = [np.arange(s, e 1) for s, e in zip(df1['C1'], df1['C2'])])\
.explode('C')
Output:
A B C1 C2 D E C
0 a1 b1 2 4 d1 e1 2
0 a1 b1 2 4 d1 e1 3
0 a1 b1 2 4 d1 e1 4
1 a2 b2 1 2 d2 e2 1
1 a2 b2 1 2 d2 e2 2