Convert x same size 2D numpy arrays to a 2 x column data frame-CodePudding

I have two ndarrays of size (m x n), and two lists of length m and n respectively. I want to convert the two matrices to a dataframe with four columns. The first two columns correspond to the m and n dimensions, and contain the values from the lists. The next two columns should contain the values from the two matrices. In total, the resulting dataframe should have m times n rows.

Example: If these are the two matrices and two lists,

a1 = np.array([[1, 2], [3, 4],[5,6]])
a2 = np.array([[10, 20], [30, 40],[50,60]])
l1 = [5,7,99]
l2 = [2,3]

then the resulting dataframe should look like this:

"l1" "l2" "a1" "a2"
5    2    1    10
7    2    3    30
99   2    5    50
5    3    2    20
7    3    4    40
99   3    6    60

The order of the rows does not matter.

Although I only have two matrices in this specific case, I am curious about a solution which is easily applicable to any number of same size matrices.

CodePudding user response：

Use np.vstack for join arrays created by numpy.tile, numpy.repeat and numpy.ravel and pass to DataFrame cosntructor:

a = np.vstack((np.tile(l1, len(l2)),
               np.repeat(l2, len(l1)),
               np.ravel(a1, 'F'), 
               np.ravel(a2, 'F'))).T
print (a)
[[ 5  2  1 10]
 [ 7  2  3 30]
 [99  2  5 50]
 [ 5  3  2 20]
 [ 7  3  4 40]
 [99  3  6 60]]


df = pd.DataFrame(a, columns=['l1','l2','a1','a2'])
print (df)
   l1  l2  a1  a2
0   5   2   1  10
1   7   2   3  30
2  99   2   5  50
3   5   3   2  20
4   7   3   4  40
5  99   3   6  60

For multiple arrays:

arrays =  [a1, a2]

arr = [np.ravel(a, 'F') for a in arrays]
a = np.vstack((np.tile(l1, len(l2)), 
               np.repeat(l2, len(l1)),
               arr)).T
print (a)
[[ 5  2  1 10]
 [ 7  2  3 30]
 [99  2  5 50]
 [ 5  3  2 20]
 [ 7  3  4 40]
 [99  3  6 60]]


df = pd.DataFrame(a, columns=['l1','l2']   [f'a{x 1}' for x in range(len(arrays))])
print (df)
   l1  l2  a1  a2
0   5   2   1  10
1   7   2   3  30
2  99   2   5  50
3   5   3   2  20
4   7   3   4  40
5  99   3   6  60

Pandas only solution with concat and DataFrame.unstack:

df = (pd.concat([pd.DataFrame(a1, columns=l2, index=l1).unstack(),
                pd.DataFrame(a2, columns=l2, index=l1).unstack()],
               axis=1, keys=['a1','a2'])
        .rename_axis(['l2','l1']).swaplevel(1,0).reset_index())
print (df)
   l1  l2  a1  a2
0   5   2   1  10
1   7   2   3  30
2  99   2   5  50
3   5   3   2  20
4   7   3   4  40
5  99   3   6  60

For multiple arrays:

arrays =  [a1, a2]
df = (pd.concat([pd.DataFrame(a, columns=l2, index=l1).unstack() for a in arrays],
               axis=1)
        .rename_axis(['l2','l1'])
        .swaplevel(1,0)
        .rename(columns=lambda x: f'a{x 1}')
        .reset_index())
print (df)
   l1  l2  a1  a2
0   5   2   1  10
1   7   2   3  30
2  99   2   5  50
3   5   3   2  20
4   7   3   4  40
5  99   3   6  60

CodePudding user response：

Expanding the excellent jezrael's answer for multiple arrays:

arrays = ... # e.g [a1, a2]
a = np.vstack([np.tile(l1, len(l2), 
               np.repeat(l2, len(l1))]   
              [np.ravel(a, 'F') for a in arrays]).T

CodePudding user response：

This should solve your problem:

import pandas as pd
import numpy as np
df=pd.DataFrame()
a1 = np.array([[1, 2], [3, 4],[5,6]])
a2 = np.array([[10, 20], [30, 40],[50,60]])
l1 = [5,7,99]
l2 = [2,3]
m,n=len(l1),len(l2)
temp_l1=[]
for i in l1:
    temp_l1.extend([i]*n)
df["l1"]=temp_l1
df["l2"]=l2*m
df["a1"]=a1.flatten()
df["a2"]=a2.flatten()
print(df)

The output:

   l1  l2  a1  a2
0   5   2   1  10
1   5   3   2  20
2   7   2   3  30
3   7   3   4  40
4  99   2   5  50
5  99   3   6  60