I'm new to working with pandas, I don't know how to solve the following problem.
I have the following dataframe:
0 1 2 3 4 5
0 a 1 d 4 g 7
1 b 2 e 5 h 8
2 c 3 f 6 i 9
and I have to turn into the following:
a 1
b 2
c 3
d 4
e 5
f 6
g 7
h 8
i 9
CodePudding user response:
Try this:
data = {
0: pd.concat(df[c] for c in df.columns[0::2]).reset_index(drop=True),
1: pd.concat(df[c] for c in df.columns[1::2]).reset_index(drop=True),
}
df = pd.DataFrame(data)
Output:
>>> df
0 1
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
Explanation
First, we select every even column and group them together:
>>> df
0 1 2 3 4 5
0 a 1 d 4 g 7
1 b 2 e 5 h 8
2 c 3 f 6 i 9
>>> df.columns
Index(['0', '1', '2', '3', '4', '5'], dtype='object')
>>> even_col_names = df.columns[0::2] # slice syntax: start:stop:step (start with the 0th item, end with the <unspecified> (last) item, select every 2 items)
Index(['0', '2', '4'], dtype='object')
>>> even_cols = df[even_col_names]
>>> even_cols
0 2 4
0 a d g
1 b e h
2 c f i
Then, we select every odd column and group them together:
>>> odd_col_names = df.columns[1::2] # start with the 1st item, select every 2 items
>>> odd_col_names
Index(['1', '3', '5'], dtype='object')
>>> odd_cols = df[odd_col_names]
>>> odd_cols
1 3 5
0 1 4 7
1 2 5 8
2 3 6 9
Then, we concatenate the even columns into a single column:
>>> even_cols_list = [df[c] for c in even_col_names]
>>> even_cols_list
[0 a
1 b
2 c
Name: 0, dtype: object,
0 d
1 e
2 f
Name: 2, dtype: object,
0 g
1 h
2 i
Name: 4, dtype: object]
>>> even_col = pd.concat(even_cols_list).reset_index(drop=True)
>>> even_col
0 a
1 b
2 c
3 d
4 e
5 f
6 g
7 h
8 i
dtype: object
Then we concatenate the odd columns into a single column:
>>> odd_cols_list = [df[c] for c in odd_col_names]
>>> odd_cols_list
[0 1
1 2
2 3
Name: 1, dtype: int64,
0 4
1 5
2 6
Name: 3, dtype: int64,
0 7
1 8
2 9
Name: 5, dtype: int64]
>>> odd_col = pd.concat(odd_cols_list).reset_index(drop=True)
>>> odd_col
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
dtype: int64
Finally, we create a new dataframe with these two columns:
>>> df = pd.DataFrame({0: even_col, 1: odd_col})
>>> df
0 1
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
CodePudding user response:
Convert data to numpy, reshape within numpy (two columns), and create a new pandas dataframe (convert relevant column to integer):
df = df.to_numpy()
df = np.reshape(df, (-1, 2)) # have a look at the docs for np.reshape
df = pd.DataFrame(df).transform(pd.to_numeric, errors='ignore')
df.sort_values(1, ignore_index = True)
0 1
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
Another option would be to individually stack the numbers and strings, before recombining into a single dataframe:
numbers = df.select_dtypes('number').stack().array
strings = df.select_dtypes('object').stack().array
out = pd.concat([pd.Series(strings), pd.Series(numbers)], axis = 1)
out.sort_values(1, ignore_index = True)
0 1
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
One more option, which takes advantage of patterns here is pivot_longer
from pyjanitor
:
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(index=None,
names_to=['0','1'],
names_pattern= ['0|2|4', '1|3|5'])
0 1
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9