Reshape and transform a dataframe/array from 32 * 32 columns by 16 rows to (32 * 16) by 32-CodePudding

I have grayscale image data for 16 32x32px images stored in a pandas dataframe. Each data row represents the serialised pixel data for one image, so the dataframe has 1024 columns.

I want to reshape the data to not only restore the original image size, but to concatenate all reshaped images in series (horizontally).

So the first row will look like this: first 32 columns: image1 - 1st row of pixels, second 32 columns: image 2 - 1st row of pixels, ...

The second row will look like this: first 32 columns: image1 - 2nd row of pixels, second 32 columns: image 2 - 2nd row of pixels, ...

So basically, I want to reshape my dataframe from (32 * 32) by 16 to (32 * 16) by 32. I want to use this data to create an image with PIL afterwards.

Is there an elegant way to do this? I'm a litte bit lost at the moment as I'm still new to using pandas and Python altogether. I do not expect a complete answer, but it would be nice if you would at least push me in the right direction.

CodePudding user response：

Here are three different functions, the first uses Pandas methods (stacking). The second uses regular python lists, building the result row by row. And the final one uses numpy reshaping.

The numpy reshaping method is twice as efficient as the others with almost all computation time actually being spent converting the DataFrame to numpy array format and then back to pandas.

Here's a link to the notebook I used for this if you want to play around with the code.

def stack_image_df(image_df):
    """
    Performance: 100 loops, best of 5: 19 ms per loop
    """
    # create a MultiIndex indicating Row and Column information for each image
    row_col_index = pd.MultiIndex.from_tuples(
        [(i // 32, i % 32) for i in range(0, 1024)], name=["row", "col"]
    )
    image_df.columns = row_col_index

    image_df.index = range(1, 17)
    image_df.index.name = "Image"

    # Use MultiIndex to reshape data
    return image_df.stack(level=1).T


def build_image_df(image_df):
    """
    Performance: 10 loops, best of 5: 19.2 ms per loop
    """
    image_data = image_df.values.tolist()
    reshaped = []
    for r_num in range(0, 32):
        row = []
        for image_num in range(0, 16):
            # for each image
            for c_num in range(0, 32):
                # get the corresponding index in the raw data
                # and add the pixel data to the row we're building
                raw_index = r_num * 32   c_num
                pixel = image_data[image_num][raw_index]
                row.append(pixel)
        reshaped.append(row)
    reshaped_df = pd.DataFrame(reshaped)
    return reshaped_df


def reshape_image_df(image_df):
    """
    Performance: 100 loops, best of 5: 9.56 ms per loop
    Note: numpy methods only account for 0.82 ms of this

    """
    return pd.DataFrame(
        image_df.to_numpy().reshape(512, 32).transpose()
    )