I have a dataframe as follows

Cycle	A_0	A_1	A_2	A_3	B_0	B_1	B_2	B_3
1	3	4	5	6		1	4	5
1	8	5	3	1	0	8	6	4
2	7	9	1	6	1	0	2	3
3	5		9	1	0	3	8	3

this dataframe has to combined to two column A and B

Expected output

Cycle	A	B
1	3
1	4	1
1	5	4
1	6	5
1	8	0
1	5	8
1	3	6
1	1	4
2	7	1
2	9	0
2	1	2
2	6	3
3	5	0
3	3
3	9	8
3	1	3

What i did?

A = [f"A_{i}" for i in range(20)]
B = [f"B_{i}" for i in range(20)]

df2['A'] = df[A].bfill(axis=1).iloc[:, 0]
df2['B'] = df[B].bfill(axis=1).iloc[:, 0]

This line of code is givng me an output datframe by avoiding the nan. How can i get the expected output?

ADDON

added a new colum to the initial data and expected outcome

CodePudding user response：

You could use a dictionary comprehension after splitting the column names on the _:

In [1]: import pandas as pd
   ...: 
   ...: d = {
   ...:     'A_0': ['3', '8', '7', '5'],
   ...:     'A_1': ['4', '5', '9', ''],
   ...:     'A_2': ['5', '3', '1', '9'],
   ...:     'A_3': ['6', '1', '6', '1'],
   ...:     'B_0': ['', '0', '1', '0'],
   ...:     'B_1': ['1', '8', '0', '3'],
   ...:     'B_2': ['4', '6', '2', '8'],
   ...:     'B_3': ['5', '4', '3', '3'],
   ...: }
   ...: 
   ...: df = pd.DataFrame(data=d)
   ...: df
Out[1]: 
  A_0 A_1 A_2 A_3 B_0 B_1 B_2 B_3
0   3   4   5   6       1   4   5
1   8   5   3   1   0   8   6   4
2   7   9   1   6   1   0   2   3
3   5       9   1   0   3   8   3
In [2]: df = pd.DataFrame({
   ...:     key: pd.Series(vals.to_numpy().ravel())
   ...:     for key, vals in df.groupby(lambda col: col.split('_')[0], axis=1)
   ...: })
   ...: df
Out[2]: 
    A  B
0   3   
1   4  1
2   5  4
3   6  5
4   8  0
5   5  8
6   3  6
7   1  4
8   7  1
9   9  0
10  1  2
11  6  3
12  5  0
13     3
14  9  8
15  1  3

CodePudding user response：

code part

columns = pd.Index(['A_0', 'A_1', 'A_2', 'A_3', 'B_0', 'B_1', 'B_2', 'B_3'], dtype='string')
values = np.array([[ 3.,  4.,  5.,  6., np.nan,  1.,  4.,  5.],
                 [ 8.,  5.,  3.,  1.,  0.,  8.,  6.,  4.],
                 [ 7.,  9.,  1.,  6.,  1.,  0.,  2.,  3.],
                 [ 5., np.nan,  9.,  1.,  0.,  3.,  8.,  3.]],
                dtype=float)
## Or retrive from raw DataFrame if already exists
# columns = df_raw.columns
# values = df_raw.values

## Construct MultiIndex
mi = pd.MultiIndex.from_tuples((s.split("_") for s in columns))

## Construct DataFrame
df = pd.DataFrame(values, columns=mi)

## reshape: stack level=1 (2nd row) of columns to index
df_result = df.stack(level=1)

>>> df_result
       A    B
0 0  3.0  NaN
  1  4.0  1.0
  2  5.0  4.0
  3  6.0  5.0
1 0  8.0  0.0
  1  5.0  8.0
  2  3.0  6.0
  3  1.0  4.0
2 0  7.0  1.0
  1  9.0  0.0
  2  1.0  2.0
  3  6.0  3.0
3 0  5.0  0.0
  1  NaN  3.0
  2  9.0  8.0
  3  1.0  3.0

Explain

Steps:

Construct MultiIndex from flat Index

Pandas provides 4 builtin method to construct MultiIndex; Here use from_tuples form doc: https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.from_tuples.html
- from_arrays :: input [[x1, x2, ...], [y1, y2, ...]] output [[x1, y1], [x2, y2], ...]
- from_tuples :: input [[x1, y1], [x2, y2], ...] output same
- from_frame :: Transfer DataFrames.values to MultiIndex
- from_product :: input like arrays, but zip them to output. e.g. input [[x1, x2], [y1, y2, y3]] output
MultiIndex([('x1', 'y1'), ('x1', 'y2'), ('x1', 'y3'), ('x2', 'y1'), ('x2', 'y2'), ('x2', 'y3')], )
Construct new DataFrame and reshape by stack

See User Guide on reshape/pivot topic: doc: https://pandas.pydata.org/docs/user_guide/reshaping.html

CodePudding user response：

If you do

as_ = df[[c for c in df.columns if c.startswith('A_')]]
bs_ = df[[c for c in df.columns if c.startswith('B_')]]

then you get the dataframes of the As and Bs

Now you can use

pd.DataFrame({
    'A': as_.values.flatten(),
    'B': bs_.values.flatten(),
    'Cycle': df.Cycle.repeat(len(as_.columns))
})

which uses the flattened values, as well as repeat.

There is nothing special about nan values here.