Home > Mobile >  Converting dataframe of size (20, 500000) to another dataframe of different size
Converting dataframe of size (20, 500000) to another dataframe of different size

Time:02-03

I have a dataframe with 20 rows and 500000 columns. Each row is a unique model consisting of 500000 numbers (columns). Therefore, we have 20 unique models. I want to convert this dataframe to a dataframe with only one column as "values", and the rows should consists of 20 * 500000 rows stacked on top of each other, such that the first 500000 rows should belong to the 500000 numbers of the first model, followed by the 500000 numbers of the second model, and so on. I used pd.melt() but that is not what I am looking for, as it does not put them in order of the models.

 import pandas as pd
 import numpy as np
 my_df = pd.DataFrame(np.random.randint(0,100,size=(20, 500000)))
 #reshaped_my_df = pd.melt(my_df)

#Update: I used the df.stack() and it worked.

df_stacked = my_df.stack().reset_index()

CodePudding user response:

All you need to do is reshape the underlying numpy array and recreate the dataframe:

import pandas as pd
import numpy as np
my_df = pd.DataFrame(np.random.randint(0,100,size=(20, 500000)))

reshaped_my_df = pd.DataFrame(my_df.values.T.reshape((-1, 1)), columns=['values'])

The -1 in the reshape arguments stands for "make this axis whatever size needed to make the reshaping work". Since your code produces a dataframe that can't be easily visualized, here's a more readable example. This should clear any confusion over what reshape is doing

>>> df = pd.DataFrame({'A':['a']*5, 'B':['b']*5,'C':['c']*5,})
>>> df.shape
(5, 3)
>>> df
   A  B  C
0  a  b  c
1  a  b  c
2  a  b  c
3  a  b  c
4  a  b  c

>>> reshaped = pd.DataFrame(df.values.T.reshape((-1, 1)), columns=['values'])
>>> reshaped.shape
(15, 1)
>>> reshaped
   values
0       a
1       a
2       a
3       a
4       a
5       b
6       b
7       b
8       b
9       b
10      c
11      c
12      c
13      c
14      c
  • Related