Home > Software engineering >  How to save multiple multidimensional array into one CSV file?
How to save multiple multidimensional array into one CSV file?

Time:10-20

I can save multiple one dimesional array to a CSV file using the below code

my_df = pd.DataFrame({"name1" : X, "name2" : y})

Also, I can save one multidimensional array to a CSV file.

X, y = make_regression(n_samples=10, n_features=2, n_informative=2, n_targets=1, random_state=1, noise=0.5)
my_df = pd.DataFrame(X)
my_df.to_csv('test_data.csv', index=False, header=True)

Here, X is a multidimensional array and I am getting a CSV file that contains the value of X in 2 separate columns (as expected).

Now, if I want to save both X and y in separate columns of the same CSV file and if I want to give names X1, X2, y1, and y2 what I need to change to the code?

My expected CSV is the value of the X and y generated by the make_regression function. From the function, we are getting 2 dimensional X and two dimensional y. So, the CSV should contain 4 columns (say, X1, X2, y1, y2).

The value of X (shape: (10, 2)) am getting from make_regression function

[[ 1.62434536 -0.61175641]
 [ 0.04221375  0.58281521]
 [-0.52817175 -1.07296862]
 [ 1.74481176 -0.7612069 ]
 [ 1.13376944 -1.09989127]
 [ 0.86540763 -2.3015387 ]
 [ 1.46210794 -2.06014071]
 [ 0.3190391  -0.24937038]
 [-0.3224172  -0.38405435]
 [-0.17242821 -0.87785842]]

The value of y (shape: (10,2))

[[ 7.08380317e 01 -1.49989469e-01]
[ 4.25574119e 01  5.08213909e 01]
[-1.10263835e 02 -1.06685245e 02]
[ 6.81167780e 01 -8.67912040e 00]
[ 3.76517652e 00 -5.56565286e 01]
[-9.82592158e 01 -1.64522187e 02]
[-4.06045719e 01 -1.25819174e 02]
[ 4.61069914e 00 -1.11695124e 01]
[-4.92313307e 01 -4.21097213e 01]
[-7.22908927e 01 -7.91525111e 01]]

The expected output


X1             X2                    y1     y2
 1.62434536 second column fo the X
 0.04221375  
-0.52817175 
 1.74481176 
 1.13376944
 0.86540763 
 1.46210794 
 0.3190391  
-0.3224172 
-0.17242821

CodePudding user response:

You can use pandas concat. I am not sure about the name you wanted like X1, X2 and so on.

    X, y = make_regression(n_samples=10, n_features=2, n_informative=2, n_targets=2, random_state=1, noise=0.5)
    print(X, y)
    full_df = pd.concat([pd.DataFrame(X),pd.DataFrame(y)],axis=1, ignore_index=True)
    full_df.to_csv('test_data.csv', index=False, header=True)
    print(full_df)

X and y from the function

[[ 1.62434536 -0.61175641]
 [ 0.04221375  0.58281521]
 [-0.52817175 -1.07296862]
 [ 1.74481176 -0.7612069 ]
 [ 1.13376944 -1.09989127]
 [ 0.86540763 -2.3015387 ]
 [ 1.46210794 -2.06014071]
 [ 0.3190391  -0.24937038]
 [-0.3224172  -0.38405435]
 [-0.17242821 -0.87785842]] [[ 7.08380317e 01 -1.49989469e-01]
 [ 4.25574119e 01  5.08213909e 01]
 [-1.10263835e 02 -1.06685245e 02]
 [ 6.81167780e 01 -8.67912040e 00]
 [ 3.76517652e 00 -5.56565286e 01]
 [-9.82592158e 01 -1.64522187e 02]
 [-4.06045719e 01 -1.25819174e 02]
 [ 4.61069914e 00 -1.11695124e 01]
 [-4.92313307e 01 -4.21097213e 01]
 [-7.22908927e 01 -7.91525111e 01]]

Output after concat

          0         1           2           3
0  1.624345 -0.611756   70.838032   -0.149989
1  0.042214  0.582815   42.557412   50.821391
2 -0.528172 -1.072969 -110.263835 -106.685245
3  1.744812 -0.761207   68.116778   -8.679120
4  1.133769 -1.099891    3.765177  -55.656529
5  0.865408 -2.301539  -98.259216 -164.522187
6  1.462108 -2.060141  -40.604572 -125.819174
7  0.319039 -0.249370    4.610699  -11.169512
8 -0.322417 -0.384054  -49.231331  -42.109721
9 -0.172428 -0.877858  -72.290893  -79.152511

CodePudding user response:

Based on your update:

You can slice numpy array, just like pandas.

import numpy as np
import pandas as pd

x = np.array([np.arange(0, 10), np.arange(10,20)])
x.reshape(10, 2)

pd.DataFrame({
    'x0': x[0, :],
    'x1': x[1, :]
})
  • Related