I have a DataFrame that looks something like this:
df = pd.DataFrame({
'A': [0, 1, 2, 3, 4],
'B': ['a', 'b', 'c', 'd', 'e'],
'V1': [0.0, 0.1, 0.2, 0.3, 0.4],
'V2': [1.0, 1.1, 1.2, 1.3, 1.4],
'V3': [2.0, 2.1, 2.2, 2.3, 2.4],
'X': ['alpha', 'beta', 'gamma', 'delta', 'epsilon'],
})
A B V1 V2 V3 X
0 0 a 0.0 1.0 2.0 alpha
1 1 b 0.1 1.1 2.1 beta
2 2 c 0.2 1.2 2.2 gamma
3 3 d 0.3 1.3 2.3 delta
4 4 e 0.4 1.4 2.4 epsilon
I'd like to use the number in the V columns to spread this out into a long-form table. The number in the column label (1 for V1, 2 for V2 etc) would become a new column value in a column named "V Number" or whatever and the value would be the only "V" in that row. Something like this (I've hidden the index here as I don't care about that):
A B V Number V X
0 a 1 0.0 alpha # Old first row, V1 value
0 a 2 1.0 alpha # Old first row, V2 value
0 a 3 2.0 alpha # Old first row, V3 value
1 b 1 0.1 beta # Old second row, V1 value
1 b 2 1.1 beta # etc...
1 b 3 2.1 beta
2 c 1 0.2 gamma
2 c 2 1.2 gamma
2 c 3 2.2 gamma
3 d 1 0.3 delta
3 d 2 1.3 delta
3 d 3 2.3 delta
4 e 1 0.4 epsilon
4 e 2 1.4 epsilon
4 e 3 2.4 epsilon
In the real DataFrame there are over 40 "V" columns, over 100 other columns and several thousand rows, so a reasonable simple and fast method would be nice! In case it helps, the column names are easy to isolate (they're actually called e.g. Test Voltage (3)
', but I shortened them for the purposes of the example) with something like [i for i in df.columns if 'Test Voltage' in i]
.
Has anyone got any ideas of a straightforward way to do this? I've tried searching for lots of methods, but keep just finding ways to split columns with lists in the cells.
CodePudding user response:
Try with wide_to_long
out = pd.wide_to_long(df,['V'],i=['A','B','X'],j='number').reset_index()
Out[23]:
A B X number V
0 0 a alpha 1 0.0
1 0 a alpha 2 1.0
2 0 a alpha 3 2.0
3 1 b beta 1 0.1
4 1 b beta 2 1.1
5 1 b beta 3 2.1
6 2 c gamma 1 0.2
7 2 c gamma 2 1.2
8 2 c gamma 3 2.2
9 3 d delta 1 0.3
10 3 d delta 2 1.3
11 3 d delta 3 2.3
12 4 e epsilon 1 0.4
13 4 e epsilon 2 1.4
14 4 e epsilon 3 2.4
CodePudding user response:
Use melt
:
>>> df.melt(id_vars=['A', 'B', 'X'], var_name='V Number', value_name='V')
A B X V Number V
0 0 a alpha V1 0.0
1 1 b beta V1 0.1
2 2 c gamma V1 0.2
3 3 d delta V1 0.3
4 4 e epsilon V1 0.4
5 0 a alpha V2 1.0
6 1 b beta V2 1.1
7 2 c gamma V2 1.2
8 3 d delta V2 1.3
9 4 e epsilon V2 1.4
10 0 a alpha V3 2.0
11 1 b beta V3 2.1
12 2 c gamma V3 2.2
13 3 d delta V3 2.3
14 4 e epsilon V3 2.4
CodePudding user response:
You can also use .stack()
, as follows:
(df.set_index(['A', 'B', 'X'])
.rename_axis(columns='V Number')
.stack()
.reset_index(name='V')
)
Result:
A B X V Number V
0 0 a alpha V1 0.0
1 0 a alpha V2 1.0
2 0 a alpha V3 2.0
3 1 b beta V1 0.1
4 1 b beta V2 1.1
5 1 b beta V3 2.1
6 2 c gamma V1 0.2
7 2 c gamma V2 1.2
8 2 c gamma V3 2.2
9 3 d delta V1 0.3
10 3 d delta V2 1.3
11 3 d delta V3 2.3
12 4 e epsilon V1 0.4
13 4 e epsilon V2 1.4
14 4 e epsilon V3 2.4
If you want the V Number
column to have only the number, you can use:
df2 = (df.set_index(['A', 'B', 'X'])
.rename_axis(columns='V Number')
.stack()
.reset_index(name='V')
)
df2['V Number'] = df2['V Number'].str[1:]
Result:
print(df2)
A B X V Number V
0 0 a alpha 1 0.0
1 0 a alpha 2 1.0
2 0 a alpha 3 2.0
3 1 b beta 1 0.1
4 1 b beta 2 1.1
5 1 b beta 3 2.1
6 2 c gamma 1 0.2
7 2 c gamma 2 1.2
8 2 c gamma 3 2.2
9 3 d delta 1 0.3
10 3 d delta 2 1.3
11 3 d delta 3 2.3
12 4 e epsilon 1 0.4
13 4 e epsilon 2 1.4
14 4 e epsilon 3 2.4