Home > Back-end >  Pandas split and append
Pandas split and append

Time:12-11

I'm new to working with pandas, I don't know how to solve the following problem.

I have the following dataframe:

    0   1   2   3   4   5
0   a   1   d   4   g   7
1   b   2   e   5   h   8
2   c   3   f   6   i   9

and I have to turn into the following:

a   1
b   2
c   3
d   4
e   5
f   6
g   7
h   8
i   9

CodePudding user response:

Try this:

data = {
    0: pd.concat(df[c] for c in df.columns[0::2]).reset_index(drop=True),
    1: pd.concat(df[c] for c in df.columns[1::2]).reset_index(drop=True),
}
df = pd.DataFrame(data)

Output:

>>> df
   0  1
0  a  1
1  b  2
2  c  3
3  d  4
4  e  5
5  f  6
6  g  7
7  h  8
8  i  9

Explanation

First, we select every even column and group them together:

>>> df
   0  1  2  3  4  5
0  a  1  d  4  g  7
1  b  2  e  5  h  8
2  c  3  f  6  i  9

>>> df.columns
Index(['0', '1', '2', '3', '4', '5'], dtype='object')

>>> even_col_names = df.columns[0::2] # slice syntax: start:stop:step (start with the 0th item, end with the <unspecified> (last) item, select every 2 items)
Index(['0', '2', '4'], dtype='object')

>>> even_cols = df[even_col_names]
>>> even_cols
0  2  4
0  a  d  g
1  b  e  h
2  c  f  i

Then, we select every odd column and group them together:

>>> odd_col_names = df.columns[1::2] # start with the 1st item, select every 2 items
>>> odd_col_names
Index(['1', '3', '5'], dtype='object')

>>> odd_cols = df[odd_col_names]
>>> odd_cols
   1  3  5
0  1  4  7
1  2  5  8
2  3  6  9

Then, we concatenate the even columns into a single column:

>>> even_cols_list = [df[c] for c in even_col_names]
>>> even_cols_list
[0    a
 1    b
 2    c
 Name: 0, dtype: object,
 0    d
 1    e
 2    f
 Name: 2, dtype: object,
 0    g
 1    h
 2    i
 Name: 4, dtype: object]
 
>>> even_col = pd.concat(even_cols_list).reset_index(drop=True)
>>> even_col
0    a
1    b
2    c
3    d
4    e
5    f
6    g
7    h
8    i
dtype: object

Then we concatenate the odd columns into a single column:

>>> odd_cols_list = [df[c] for c in odd_col_names]
>>> odd_cols_list
[0    1
 1    2
 2    3
 Name: 1, dtype: int64,
 0    4
 1    5
 2    6
 Name: 3, dtype: int64,
 0    7
 1    8
 2    9
 Name: 5, dtype: int64]
 
>>> odd_col = pd.concat(odd_cols_list).reset_index(drop=True)
>>> odd_col
0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
8    9
dtype: int64

Finally, we create a new dataframe with these two columns:

>>> df = pd.DataFrame({0: even_col, 1: odd_col})
>>> df
   0  1
0  a  1
1  b  2
2  c  3
3  d  4
4  e  5
5  f  6
6  g  7
7  h  8
8  i  9

CodePudding user response:

Convert data to numpy, reshape within numpy (two columns), and create a new pandas dataframe (convert relevant column to integer):

df = df.to_numpy()
df = np.reshape(df, (-1, 2)) # have a look at the docs for np.reshape
df = pd.DataFrame(df).transform(pd.to_numeric, errors='ignore')
df.sort_values(1, ignore_index = True)

   0  1
0  a  1
1  b  2
2  c  3
3  d  4
4  e  5
5  f  6
6  g  7
7  h  8
8  i  9

Another option would be to individually stack the numbers and strings, before recombining into a single dataframe:

numbers = df.select_dtypes('number').stack().array
strings = df.select_dtypes('object').stack().array

out = pd.concat([pd.Series(strings), pd.Series(numbers)], axis = 1)

out.sort_values(1, ignore_index = True)

   0  1
0  a  1
1  b  2
2  c  3
3  d  4
4  e  5
5  f  6
6  g  7
7  h  8
8  i  9

One more option, which takes advantage of patterns here is pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(index=None, 
                names_to=['0','1'], 
                names_pattern= ['0|2|4', '1|3|5'])

   0  1
0  a  1
1  b  2
2  c  3
3  d  4
4  e  5
5  f  6
6  g  7
7  h  8
8  i  9
  • Related