Slice dataframe and put slices into new colums-CodePudding

i have a big dataframe with 1 million rows of time series data. I want to slice it into smaller chunks of 1000 rows each. So this would give me 1000 chunks, and i need every chunk to be copied into a column of a new dataframe.

CodePudding user response：

i am now doing this, which does the job but might be inefficient. Im still happy if people could help:

for df_split in np.array_split(df, len(df) // chunk_size):
  #print(df_split['random_nos'].mean())
  i=i 1
  df_split= df_split.reset_index()
  df_split = df_split.rename({'random_nos': 'String' str(i)}, axis=1)
  df_all = pd.concat([df_all, df_split], axis=1)

CodePudding user response：

You could use numpy.array_split to achieve this:

import pandas as pd
import numpy as np

def slice_df_into_chunks(df_size, chunk_size):
    df = pd.DataFrame(np.random.rand(df_size), columns=['random_nos'])
    df_list = []
    for i, df_split in enumerate(np.array_split(df, chunk_size)):
        df_split = df_split.rename(columns={'random_nos':f'String{i}'})
        df_split.reset_index(drop=True, inplace=True)
        df_list.append(df_split)

    return pd.concat(df_list, axis=1)
    
slice_df_into_chunks(10**6, 10**3) # Give whatever sizes you want

Note that if df_size is not exactly divisible by chunk_size (e.g: 10 & 3) then one chunk will have extra numbers.

slice_df_into_chunks(10**6, 10**3)

    String0     String1     String2
0   0.955620    0.543234    0.509360
1   0.755157    0.174576    0.267600
2   0.816509    0.776549    0.455464
3   0.990282    NaN         NaN