For example, create 3 DataFrame
s and concatenate it:
df1 = pd.DataFrame({'1':np.random.rand(10),
'2':np.random.rand(10),
'3':np.random.rand(10),
'4':np.random.rand(10)}).assign(Location = 'A')
df2 = pd.DataFrame({'1':np.random.rand(10),
'2':np.random.rand(10),
'3':np.random.rand(10),
'4':np.random.rand(10)}).assign(Location = 'B')
df3 = pd.DataFrame({'1':np.random.rand(10),
'2':np.random.rand(10),
'3':np.random.rand(10),
'4':np.random.rand(10)}).assign(Location = 'C')
cdf = pd.concat([df1, df2, df3])
It gives us:
1 2 3 4 Location
0 0.243073 0.946684 0.197635 0.941140 A
1 0.564369 0.747247 0.784011 0.990973 A
2 0.211258 0.976495 0.039176 0.892454 A
3 0.112643 0.674075 0.158149 0.764700 A
4 0.603871 0.793998 0.460644 0.191124 A
5 0.050415 0.460084 0.791560 0.991015 A
6 0.215605 0.103512 0.362365 0.877847 A
7 0.508424 0.429722 0.148857 0.733401 A
8 0.073434 0.575820 0.382577 0.441545 A
9 0.242787 0.366286 0.283623 0.213178 A
0 0.839207 0.154315 0.083025 0.085282 B
1 0.387318 0.693141 0.211551 0.714996 B
2 0.257663 0.413717 0.548559 0.283256 B
3 0.523662 0.861542 0.765760 0.983974 B
4 0.303765 0.774284 0.875710 0.473910 B
5 0.021572 0.937372 0.652112 0.924507 B
6 0.261787 0.004706 0.120351 0.863288 B
7 0.713536 0.741964 0.367541 0.097494 B
8 0.195457 0.845070 0.202422 0.393003 B
9 0.985934 0.167777 0.799820 0.578928 B
0 0.204089 0.108707 0.022954 0.233555 C
1 0.620620 0.900607 0.882538 0.560776 C
2 0.077108 0.397158 0.592832 0.740506 C
3 0.422404 0.040622 0.180017 0.168166 C
4 0.893277 0.740311 0.128431 0.982510 C
5 0.242766 0.030185 0.074385 0.543565 C
6 0.740050 0.161741 0.182423 0.917792 C
7 0.360404 0.374892 0.773396 0.631226 C
8 0.495498 0.589751 0.619968 0.331093 C
9 0.552977 0.040000 0.657023 0.093788 C
I am trying to exchange the columns [1, 2, 3, 4]
for the elements of Location, which is [A, B, C]
.
The result is expected to be:
A B C original_axis
0 0.088333 0.009939 0.117679 1
1 0.340463 0.537707 0.261598 1
2 0.243001 0.497268 0.098827 1
3 0.514245 0.133166 0.435097 1
4 0.715491 0.597498 0.635434 1
5 0.578538 0.017931 0.255325 1
6 0.854928 0.485360 0.954285 1
7 0.460750 0.959085 0.657368 1
8 0.573592 0.990451 0.605646 1
9 0.086094 0.206977 0.801752 1
0 0.549036 0.370826 0.387348 2
1 0.323326 0.891108 0.546695 2
2 0.695849 0.329717 0.089323 2
3 0.815171 0.125956 0.973977 2
4 0.123839 0.093963 0.094081 2
5 0.679839 0.093035 0.378317 2
6 0.144703 0.467749 0.146022 2
7 0.011454 0.226450 0.833976 2
8 0.527904 0.848700 0.457423 2
9 0.492924 0.027877 0.608478 2
0 0.723353 0.225212 0.411559 3
1 0.271650 0.042965 0.102511 3
2 0.089035 0.443239 0.222270 3
3 0.848120 0.325830 0.025566 3
4 0.991819 0.820941 0.824764 3
5 0.458587 0.998833 0.519268 3
6 0.615102 0.518176 0.878235 3
7 0.459854 0.920891 0.295850 3
8 0.913309 0.023715 0.871933 3
9 0.847045 0.635648 0.976897 3
0 0.017517 0.744492 0.626891 4
1 0.217091 0.480721 0.488773 4
2 0.261332 0.727520 0.851186 4
3 0.636652 0.198315 0.274114 4
4 0.674848 0.121301 0.979112 4
5 0.179912 0.923860 0.626627 4
6 0.838932 0.708422 0.007303 4
7 0.960088 0.221822 0.717819 4
8 0.388269 0.451810 0.379369 4
9 0.639050 0.323490 0.078356 4
To do it, I melt it by usingmdf = pd.melt(cdf, id_vars='Location', var_name='time')
Location time value
0 A 1 0.046463
1 A 1 0.282879
2 A 1 0.570865
3 A 1 0.719256
4 A 1 0.771334
.. ... ... ...
115 C 4 0.189733
116 C 4 0.244067
117 C 4 0.763828
118 C 4 0.268741
119 C 4 0.586709
And use mdf.pivot(index = None, columns = 'Location', values = 'value')
, but get:
Location A B C
0 0.046463 NaN NaN
1 0.282879 NaN NaN
2 0.570865 NaN NaN
3 0.719256 NaN NaN
4 0.771334 NaN NaN
.. ... .. ...
115 NaN NaN 0.189733
116 NaN NaN 0.244067
117 NaN NaN 0.763828
118 NaN NaN 0.268741
119 NaN NaN 0.586709
Then, I use mdf.pivot_table(columns = 'Location', values = 'value', index = None)
, but get:
Location A B C
value 0.497467 0.421472 0.474544
I don't know how to correct it.
How can I perform an above-mentioned exchange?
CodePudding user response:
since the data is randomised, the outputs are likely to be different - the solution should be applicable though. The idea is a combination of long to wide and back to long - if you encounter index error
due to duplicates, then you can add a unique counter with a groupby:
(cdf
.set_index('Location', append = True)
.unstack('Location')
.stack(level=0)
.rename_axis(index = [None, 'original_axis'], columns=None)
# depending on your end goal you may skip
# the code below
.reset_index('original_axis')
.loc(axis=1)[['A', 'B', 'C', 'original_axis']]
.reset_index(drop=True)
)
A B C original_axis
0 0.243073 0.839207 0.204089 1
1 0.946684 0.154315 0.108707 2
2 0.197635 0.083025 0.022954 3
3 0.941140 0.085282 0.233555 4
4 0.564369 0.387318 0.620620 1
5 0.747247 0.693141 0.900607 2
6 0.784011 0.211551 0.882538 3
7 0.990973 0.714996 0.560776 4
8 0.211258 0.257663 0.077108 1
9 0.976495 0.413717 0.397158 2
10 0.039176 0.548559 0.592832 3
11 0.892454 0.283256 0.740506 4
12 0.112643 0.523662 0.422404 1
13 0.674075 0.861542 0.040622 2
14 0.158149 0.765760 0.180017 3
15 0.764700 0.983974 0.168166 4
16 0.603871 0.303765 0.893277 1
17 0.793998 0.774284 0.740311 2
18 0.460644 0.875710 0.128431 3
19 0.191124 0.473910 0.982510 4
20 0.050415 0.021572 0.242766 1
21 0.460084 0.937372 0.030185 2
22 0.791560 0.652112 0.074385 3
23 0.991015 0.924507 0.543565 4
24 0.215605 0.261787 0.740050 1
25 0.103512 0.004706 0.161741 2
26 0.362365 0.120351 0.182423 3
27 0.877847 0.863288 0.917792 4
28 0.508424 0.713536 0.360404 1
29 0.429722 0.741964 0.374892 2
30 0.148857 0.367541 0.773396 3
31 0.733401 0.097494 0.631226 4
32 0.073434 0.195457 0.495498 1
33 0.575820 0.845070 0.589751 2
34 0.382577 0.202422 0.619968 3
35 0.441545 0.393003 0.331093 4
36 0.242787 0.985934 0.552977 1
37 0.366286 0.167777 0.040000 2
38 0.283623 0.799820 0.657023 3
39 0.213178 0.578928 0.093788 4
CodePudding user response:
If possible use counter by index values from original DataFrame cdf
add ignore_index=False
to melt
, convert to index
and add to pivoting to index
parameter, last convert first level of MultiIndex
to columns and add some data cleaning:
cdf = pd.concat([df1, df2, df3])
mdf = (pd.melt(cdf, id_vars='Location', var_name='time', ignore_index=False)
.reset_index()
.pivot_table(index = ['index','time'], columns = 'Location', values = 'value')
.reset_index(level=1)
.sort_values('time')
.reset_index(drop=True)
.rename_axis(None, axis=1)
.assign(original_axis = lambda x: x.pop('time'))
)
print (mdf.head(20))
A B C original_axis
0 0.321988 0.261495 0.834801 1
1 0.727275 0.721664 0.189777 1
2 0.890422 0.405724 0.549623 1
3 0.141341 0.078760 0.550035 1
4 0.524387 0.663287 0.742467 1
5 0.544935 0.218225 0.629431 1
6 0.588052 0.553420 0.021791 1
7 0.467896 0.972283 0.013206 1
8 0.022090 0.411311 0.715893 1
9 0.126596 0.625526 0.347060 1
10 0.390991 0.183550 0.115243 2
11 0.360875 0.110048 0.518665 2
12 0.180323 0.412962 0.709591 2
13 0.162077 0.912792 0.607409 2
14 0.337959 0.357006 0.474816 2
15 0.394469 0.863313 0.514885 2
16 0.501382 0.729779 0.627029 2
17 0.035648 0.585990 0.732794 2
18 0.456373 0.187173 0.362184 2
19 0.151172 0.391720 0.894907 2
Or:
cdf = pd.concat([df1, df2, df3])
mdf = (pd.melt(cdf.assign(g = cdf.groupby('Location').cumcount()),
id_vars=['Location','g'], var_name='time', ignore_index=False)
.pivot_table(index = ['g','time'], columns = 'Location', values = 'value')
.reset_index(level=1)
.sort_values('time')
.reset_index(drop=True)
.rename_axis(None, axis=1)
.assign(original_axis = lambda x: x.pop('time'))
)
print (mdf.head(20))
A B C original_axis
0 0.321988 0.261495 0.834801 1
1 0.727275 0.721664 0.189777 1
2 0.890422 0.405724 0.549623 1
3 0.141341 0.078760 0.550035 1
4 0.524387 0.663287 0.742467 1
5 0.544935 0.218225 0.629431 1
6 0.588052 0.553420 0.021791 1
7 0.467896 0.972283 0.013206 1
8 0.022090 0.411311 0.715893 1
9 0.126596 0.625526 0.347060 1
10 0.390991 0.183550 0.115243 2
11 0.360875 0.110048 0.518665 2
12 0.180323 0.412962 0.709591 2
13 0.162077 0.912792 0.607409 2
14 0.337959 0.357006 0.474816 2
15 0.394469 0.863313 0.514885 2
16 0.501382 0.729779 0.627029 2
17 0.035648 0.585990 0.732794 2
18 0.456373 0.187173 0.362184 2
19 0.151172 0.391720 0.894907 2