I have two ndarrays of size (m x n), and two lists of length m and n respectively. I want to convert the two matrices to a dataframe with four columns. The first two columns correspond to the m and n dimensions, and contain the values from the lists. The next two columns should contain the values from the two matrices. In total, the resulting dataframe should have m times n rows.
Example: If these are the two matrices and two lists,
a1 = np.array([[1, 2], [3, 4],[5,6]])
a2 = np.array([[10, 20], [30, 40],[50,60]])
l1 = [5,7,99]
l2 = [2,3]
then the resulting dataframe should look like this:
"l1" "l2" "a1" "a2"
5 2 1 10
7 2 3 30
99 2 5 50
5 3 2 20
7 3 4 40
99 3 6 60
The order of the rows does not matter.
Although I only have two matrices in this specific case, I am curious about a solution which is easily applicable to any number of same size matrices.
CodePudding user response:
Use np.vstack
for join arrays created by numpy.tile
, numpy.repeat
and numpy.ravel
and pass to DataFrame
cosntructor:
a = np.vstack((np.tile(l1, len(l2)),
np.repeat(l2, len(l1)),
np.ravel(a1, 'F'),
np.ravel(a2, 'F'))).T
print (a)
[[ 5 2 1 10]
[ 7 2 3 30]
[99 2 5 50]
[ 5 3 2 20]
[ 7 3 4 40]
[99 3 6 60]]
df = pd.DataFrame(a, columns=['l1','l2','a1','a2'])
print (df)
l1 l2 a1 a2
0 5 2 1 10
1 7 2 3 30
2 99 2 5 50
3 5 3 2 20
4 7 3 4 40
5 99 3 6 60
For multiple arrays:
arrays = [a1, a2]
arr = [np.ravel(a, 'F') for a in arrays]
a = np.vstack((np.tile(l1, len(l2)),
np.repeat(l2, len(l1)),
arr)).T
print (a)
[[ 5 2 1 10]
[ 7 2 3 30]
[99 2 5 50]
[ 5 3 2 20]
[ 7 3 4 40]
[99 3 6 60]]
df = pd.DataFrame(a, columns=['l1','l2'] [f'a{x 1}' for x in range(len(arrays))])
print (df)
l1 l2 a1 a2
0 5 2 1 10
1 7 2 3 30
2 99 2 5 50
3 5 3 2 20
4 7 3 4 40
5 99 3 6 60
Pandas only solution with concat
and DataFrame.unstack
:
df = (pd.concat([pd.DataFrame(a1, columns=l2, index=l1).unstack(),
pd.DataFrame(a2, columns=l2, index=l1).unstack()],
axis=1, keys=['a1','a2'])
.rename_axis(['l2','l1']).swaplevel(1,0).reset_index())
print (df)
l1 l2 a1 a2
0 5 2 1 10
1 7 2 3 30
2 99 2 5 50
3 5 3 2 20
4 7 3 4 40
5 99 3 6 60
For multiple arrays:
arrays = [a1, a2]
df = (pd.concat([pd.DataFrame(a, columns=l2, index=l1).unstack() for a in arrays],
axis=1)
.rename_axis(['l2','l1'])
.swaplevel(1,0)
.rename(columns=lambda x: f'a{x 1}')
.reset_index())
print (df)
l1 l2 a1 a2
0 5 2 1 10
1 7 2 3 30
2 99 2 5 50
3 5 3 2 20
4 7 3 4 40
5 99 3 6 60
CodePudding user response:
Expanding the excellent jezrael's answer for multiple arrays:
arrays = ... # e.g [a1, a2]
a = np.vstack([np.tile(l1, len(l2),
np.repeat(l2, len(l1))]
[np.ravel(a, 'F') for a in arrays]).T
CodePudding user response:
This should solve your problem:
import pandas as pd
import numpy as np
df=pd.DataFrame()
a1 = np.array([[1, 2], [3, 4],[5,6]])
a2 = np.array([[10, 20], [30, 40],[50,60]])
l1 = [5,7,99]
l2 = [2,3]
m,n=len(l1),len(l2)
temp_l1=[]
for i in l1:
temp_l1.extend([i]*n)
df["l1"]=temp_l1
df["l2"]=l2*m
df["a1"]=a1.flatten()
df["a2"]=a2.flatten()
print(df)
The output:
l1 l2 a1 a2
0 5 2 1 10
1 5 3 2 20
2 7 2 3 30
3 7 3 4 40
4 99 2 5 50
5 99 3 6 60