Home > database >  How to combine multiple set of columns in a dataframe to single one?
How to combine multiple set of columns in a dataframe to single one?

Time:05-14

I have a dataframe as follows

Cycle A_0 A_1 A_2 A_3 B_0 B_1 B_2 B_3
1 3 4 5 6 1 4 5
1 8 5 3 1 0 8 6 4
2 7 9 1 6 1 0 2 3
3 5 9 1 0 3 8 3

this dataframe has to combined to two column A and B

Expected output

Cycle A B
1 3
1 4 1
1 5 4
1 6 5
1 8 0
1 5 8
1 3 6
1 1 4
2 7 1
2 9 0
2 1 2
2 6 3
3 5 0
3 3
3 9 8
3 1 3

What i did?

A = [f"A_{i}" for i in range(20)]
B = [f"B_{i}" for i in range(20)]

df2['A'] = df[A].bfill(axis=1).iloc[:, 0]
df2['B'] = df[B].bfill(axis=1).iloc[:, 0]

This line of code is givng me an output datframe by avoiding the nan. How can i get the expected output?

ADDON

added a new colum to the initial data and expected outcome

CodePudding user response:

You could use a dictionary comprehension after splitting the column names on the _:

In [1]: import pandas as pd
   ...: 
   ...: d = {
   ...:     'A_0': ['3', '8', '7', '5'],
   ...:     'A_1': ['4', '5', '9', ''],
   ...:     'A_2': ['5', '3', '1', '9'],
   ...:     'A_3': ['6', '1', '6', '1'],
   ...:     'B_0': ['', '0', '1', '0'],
   ...:     'B_1': ['1', '8', '0', '3'],
   ...:     'B_2': ['4', '6', '2', '8'],
   ...:     'B_3': ['5', '4', '3', '3'],
   ...: }
   ...: 
   ...: df = pd.DataFrame(data=d)
   ...: df
Out[1]: 
  A_0 A_1 A_2 A_3 B_0 B_1 B_2 B_3
0   3   4   5   6       1   4   5
1   8   5   3   1   0   8   6   4
2   7   9   1   6   1   0   2   3
3   5       9   1   0   3   8   3
In [2]: df = pd.DataFrame({
   ...:     key: pd.Series(vals.to_numpy().ravel())
   ...:     for key, vals in df.groupby(lambda col: col.split('_')[0], axis=1)
   ...: })
   ...: df
Out[2]: 
    A  B
0   3   
1   4  1
2   5  4
3   6  5
4   8  0
5   5  8
6   3  6
7   1  4
8   7  1
9   9  0
10  1  2
11  6  3
12  5  0
13     3
14  9  8
15  1  3

CodePudding user response:

code part

columns = pd.Index(['A_0', 'A_1', 'A_2', 'A_3', 'B_0', 'B_1', 'B_2', 'B_3'], dtype='string')
values = np.array([[ 3.,  4.,  5.,  6., np.nan,  1.,  4.,  5.],
                 [ 8.,  5.,  3.,  1.,  0.,  8.,  6.,  4.],
                 [ 7.,  9.,  1.,  6.,  1.,  0.,  2.,  3.],
                 [ 5., np.nan,  9.,  1.,  0.,  3.,  8.,  3.]],
                dtype=float)
## Or retrive from raw DataFrame if already exists
# columns = df_raw.columns
# values = df_raw.values

## Construct MultiIndex
mi = pd.MultiIndex.from_tuples((s.split("_") for s in columns))

## Construct DataFrame
df = pd.DataFrame(values, columns=mi)

## reshape: stack level=1 (2nd row) of columns to index
df_result = df.stack(level=1)

>>> df_result
       A    B
0 0  3.0  NaN
  1  4.0  1.0
  2  5.0  4.0
  3  6.0  5.0
1 0  8.0  0.0
  1  5.0  8.0
  2  3.0  6.0
  3  1.0  4.0
2 0  7.0  1.0
  1  9.0  0.0
  2  1.0  2.0
  3  6.0  3.0
3 0  5.0  0.0
  1  NaN  3.0
  2  9.0  8.0
  3  1.0  3.0

Explain

Steps:

  1. Construct MultiIndex from flat Index

    Pandas provides 4 builtin method to construct MultiIndex; Here use from_tuples form doc: https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.from_tuples.html

    • from_arrays :: input [[x1, x2, ...], [y1, y2, ...]] output [[x1, y1], [x2, y2], ...]

    • from_tuples :: input [[x1, y1], [x2, y2], ...] output same

    • from_frame :: Transfer DataFrames.values to MultiIndex

    • from_product :: input like arrays, but zip them to output. e.g. input [[x1, x2], [y1, y2, y3]] output

    MultiIndex([('x1', 'y1'), ('x1', 'y2'), ('x1', 'y3'), ('x2', 'y1'), ('x2', 'y2'), ('x2', 'y3')], )

  2. Construct new DataFrame and reshape by stack

    See User Guide on reshape/pivot topic: doc: https://pandas.pydata.org/docs/user_guide/reshaping.html

CodePudding user response:

If you do

as_ = df[[c for c in df.columns if c.startswith('A_')]]
bs_ = df[[c for c in df.columns if c.startswith('B_')]]

then you get the dataframes of the As and Bs

Now you can use

pd.DataFrame({
    'A': as_.values.flatten(),
    'B': bs_.values.flatten(),
    'Cycle': df.Cycle.repeat(len(as_.columns))
})

which uses the flattened values, as well as repeat.

There is nothing special about nan values here.

  • Related