Pandas DataFrame split numbered columns into rows by column title-CodePudding

I have a DataFrame that looks something like this:

df = pd.DataFrame({
    'A': [0, 1, 2, 3, 4],
    'B': ['a', 'b', 'c', 'd', 'e'],
    'V1': [0.0, 0.1, 0.2, 0.3, 0.4],
    'V2': [1.0, 1.1, 1.2, 1.3, 1.4],
    'V3': [2.0, 2.1, 2.2, 2.3, 2.4],
    'X': ['alpha', 'beta', 'gamma', 'delta', 'epsilon'],
})

   A  B   V1   V2   V3        X
0  0  a  0.0  1.0  2.0    alpha
1  1  b  0.1  1.1  2.1     beta
2  2  c  0.2  1.2  2.2    gamma
3  3  d  0.3  1.3  2.3    delta
4  4  e  0.4  1.4  2.4  epsilon

I'd like to use the number in the V columns to spread this out into a long-form table. The number in the column label (1 for V1, 2 for V2 etc) would become a new column value in a column named "V Number" or whatever and the value would be the only "V" in that row. Something like this (I've hidden the index here as I don't care about that):

A  B  V Number    V        X
0  a      1     0.0    alpha     # Old first row, V1 value
0  a      2     1.0    alpha     # Old first row, V2 value
0  a      3     2.0    alpha     # Old first row, V3 value
1  b      1     0.1     beta     # Old second row, V1 value
1  b      2     1.1     beta     # etc...
1  b      3     2.1     beta
2  c      1     0.2    gamma
2  c      2     1.2    gamma
2  c      3     2.2    gamma
3  d      1     0.3    delta
3  d      2     1.3    delta
3  d      3     2.3    delta
4  e      1     0.4  epsilon
4  e      2     1.4  epsilon
4  e      3     2.4  epsilon

In the real DataFrame there are over 40 "V" columns, over 100 other columns and several thousand rows, so a reasonable simple and fast method would be nice! In case it helps, the column names are easy to isolate (they're actually called e.g. Test Voltage (3)', but I shortened them for the purposes of the example) with something like [i for i in df.columns if 'Test Voltage' in i].

Has anyone got any ideas of a straightforward way to do this? I've tried searching for lots of methods, but keep just finding ways to split columns with lists in the cells.

CodePudding user response：

Try with wide_to_long

out = pd.wide_to_long(df,['V'],i=['A','B','X'],j='number').reset_index()
Out[23]: 
    A  B        X  number    V
0   0  a    alpha       1  0.0
1   0  a    alpha       2  1.0
2   0  a    alpha       3  2.0
3   1  b     beta       1  0.1
4   1  b     beta       2  1.1
5   1  b     beta       3  2.1
6   2  c    gamma       1  0.2
7   2  c    gamma       2  1.2
8   2  c    gamma       3  2.2
9   3  d    delta       1  0.3
10  3  d    delta       2  1.3
11  3  d    delta       3  2.3
12  4  e  epsilon       1  0.4
13  4  e  epsilon       2  1.4
14  4  e  epsilon       3  2.4

CodePudding user response：

Use melt:

>>> df.melt(id_vars=['A', 'B', 'X'], var_name='V Number', value_name='V')
    A  B        X V Number    V
0   0  a    alpha       V1  0.0
1   1  b     beta       V1  0.1
2   2  c    gamma       V1  0.2
3   3  d    delta       V1  0.3
4   4  e  epsilon       V1  0.4
5   0  a    alpha       V2  1.0
6   1  b     beta       V2  1.1
7   2  c    gamma       V2  1.2
8   3  d    delta       V2  1.3
9   4  e  epsilon       V2  1.4
10  0  a    alpha       V3  2.0
11  1  b     beta       V3  2.1
12  2  c    gamma       V3  2.2
13  3  d    delta       V3  2.3
14  4  e  epsilon       V3  2.4

CodePudding user response：

You can also use .stack(), as follows:

(df.set_index(['A', 'B', 'X'])
   .rename_axis(columns='V Number')
   .stack()
   .reset_index(name='V')
)

Result:

    A  B        X V Number    V
0   0  a    alpha       V1  0.0
1   0  a    alpha       V2  1.0
2   0  a    alpha       V3  2.0
3   1  b     beta       V1  0.1
4   1  b     beta       V2  1.1
5   1  b     beta       V3  2.1
6   2  c    gamma       V1  0.2
7   2  c    gamma       V2  1.2
8   2  c    gamma       V3  2.2
9   3  d    delta       V1  0.3
10  3  d    delta       V2  1.3
11  3  d    delta       V3  2.3
12  4  e  epsilon       V1  0.4
13  4  e  epsilon       V2  1.4
14  4  e  epsilon       V3  2.4

If you want the V Number column to have only the number, you can use:

df2 = (df.set_index(['A', 'B', 'X'])
         .rename_axis(columns='V Number')
         .stack()
         .reset_index(name='V')
      )

df2['V Number'] = df2['V Number'].str[1:]

Result:

print(df2)

    A  B        X V Number    V
0   0  a    alpha        1  0.0
1   0  a    alpha        2  1.0
2   0  a    alpha        3  2.0
3   1  b     beta        1  0.1
4   1  b     beta        2  1.1
5   1  b     beta        3  2.1
6   2  c    gamma        1  0.2
7   2  c    gamma        2  1.2
8   2  c    gamma        3  2.2
9   3  d    delta        1  0.3
10  3  d    delta        2  1.3
11  3  d    delta        3  2.3
12  4  e  epsilon        1  0.4
13  4  e  epsilon        2  1.4
14  4  e  epsilon        3  2.4