Home > Software design >  Convert x same size 2D numpy arrays to a 2 x column data frame
Convert x same size 2D numpy arrays to a 2 x column data frame

Time:04-30

I have two ndarrays of size (m x n), and two lists of length m and n respectively. I want to convert the two matrices to a dataframe with four columns. The first two columns correspond to the m and n dimensions, and contain the values from the lists. The next two columns should contain the values from the two matrices. In total, the resulting dataframe should have m times n rows.

Example: If these are the two matrices and two lists,

a1 = np.array([[1, 2], [3, 4],[5,6]])
a2 = np.array([[10, 20], [30, 40],[50,60]])
l1 = [5,7,99]
l2 = [2,3]

then the resulting dataframe should look like this:

"l1" "l2" "a1" "a2"
5    2    1    10
7    2    3    30
99   2    5    50
5    3    2    20
7    3    4    40
99   3    6    60

The order of the rows does not matter.

Although I only have two matrices in this specific case, I am curious about a solution which is easily applicable to any number of same size matrices.

CodePudding user response:

Use np.vstack for join arrays created by numpy.tile, numpy.repeat and numpy.ravel and pass to DataFrame cosntructor:

a = np.vstack((np.tile(l1, len(l2)),
               np.repeat(l2, len(l1)),
               np.ravel(a1, 'F'), 
               np.ravel(a2, 'F'))).T
print (a)
[[ 5  2  1 10]
 [ 7  2  3 30]
 [99  2  5 50]
 [ 5  3  2 20]
 [ 7  3  4 40]
 [99  3  6 60]]


df = pd.DataFrame(a, columns=['l1','l2','a1','a2'])
print (df)
   l1  l2  a1  a2
0   5   2   1  10
1   7   2   3  30
2  99   2   5  50
3   5   3   2  20
4   7   3   4  40
5  99   3   6  60

For multiple arrays:

arrays =  [a1, a2]

arr = [np.ravel(a, 'F') for a in arrays]
a = np.vstack((np.tile(l1, len(l2)), 
               np.repeat(l2, len(l1)),
               arr)).T
print (a)
[[ 5  2  1 10]
 [ 7  2  3 30]
 [99  2  5 50]
 [ 5  3  2 20]
 [ 7  3  4 40]
 [99  3  6 60]]


df = pd.DataFrame(a, columns=['l1','l2']   [f'a{x 1}' for x in range(len(arrays))])
print (df)
   l1  l2  a1  a2
0   5   2   1  10
1   7   2   3  30
2  99   2   5  50
3   5   3   2  20
4   7   3   4  40
5  99   3   6  60

Pandas only solution with concat and DataFrame.unstack:

df = (pd.concat([pd.DataFrame(a1, columns=l2, index=l1).unstack(),
                pd.DataFrame(a2, columns=l2, index=l1).unstack()],
               axis=1, keys=['a1','a2'])
        .rename_axis(['l2','l1']).swaplevel(1,0).reset_index())
print (df)
   l1  l2  a1  a2
0   5   2   1  10
1   7   2   3  30
2  99   2   5  50
3   5   3   2  20
4   7   3   4  40
5  99   3   6  60

For multiple arrays:

arrays =  [a1, a2]
df = (pd.concat([pd.DataFrame(a, columns=l2, index=l1).unstack() for a in arrays],
               axis=1)
        .rename_axis(['l2','l1'])
        .swaplevel(1,0)
        .rename(columns=lambda x: f'a{x 1}')
        .reset_index())
print (df)
   l1  l2  a1  a2
0   5   2   1  10
1   7   2   3  30
2  99   2   5  50
3   5   3   2  20
4   7   3   4  40
5  99   3   6  60

CodePudding user response:

Expanding the excellent jezrael's answer for multiple arrays:

arrays = ... # e.g [a1, a2]
a = np.vstack([np.tile(l1, len(l2), 
               np.repeat(l2, len(l1))]   
              [np.ravel(a, 'F') for a in arrays]).T

CodePudding user response:

This should solve your problem:

import pandas as pd
import numpy as np
df=pd.DataFrame()
a1 = np.array([[1, 2], [3, 4],[5,6]])
a2 = np.array([[10, 20], [30, 40],[50,60]])
l1 = [5,7,99]
l2 = [2,3]
m,n=len(l1),len(l2)
temp_l1=[]
for i in l1:
    temp_l1.extend([i]*n)
df["l1"]=temp_l1
df["l2"]=l2*m
df["a1"]=a1.flatten()
df["a2"]=a2.flatten()
print(df)

The output:

   l1  l2  a1  a2
0   5   2   1  10
1   5   3   2  20
2   7   2   3  30
3   7   3   4  40
4  99   2   5  50
5  99   3   6  60
  • Related