I can save multiple one dimesional array to a CSV file using the below code
my_df = pd.DataFrame({"name1" : X, "name2" : y})
Also, I can save one multidimensional array to a CSV file.
X, y = make_regression(n_samples=10, n_features=2, n_informative=2, n_targets=1, random_state=1, noise=0.5)
my_df = pd.DataFrame(X)
my_df.to_csv('test_data.csv', index=False, header=True)
Here, X
is a multidimensional array and I am getting a CSV file that contains the value of X
in 2 separate columns (as expected).
Now, if I want to save both X
and y
in separate columns of the same CSV file and if I want to give names X1
, X2
, y1
, and y2
what I need to change to the code?
My expected CSV is the value of the X
and y
generated by the make_regression
function. From the function, we are getting 2 dimensional X
and two dimensional y
. So, the CSV should contain 4 columns (say, X1, X2, y1, y2).
The value of X (shape: (10, 2)) am getting from make_regression function
[[ 1.62434536 -0.61175641]
[ 0.04221375 0.58281521]
[-0.52817175 -1.07296862]
[ 1.74481176 -0.7612069 ]
[ 1.13376944 -1.09989127]
[ 0.86540763 -2.3015387 ]
[ 1.46210794 -2.06014071]
[ 0.3190391 -0.24937038]
[-0.3224172 -0.38405435]
[-0.17242821 -0.87785842]]
The value of y (shape: (10,2))
[[ 7.08380317e 01 -1.49989469e-01]
[ 4.25574119e 01 5.08213909e 01]
[-1.10263835e 02 -1.06685245e 02]
[ 6.81167780e 01 -8.67912040e 00]
[ 3.76517652e 00 -5.56565286e 01]
[-9.82592158e 01 -1.64522187e 02]
[-4.06045719e 01 -1.25819174e 02]
[ 4.61069914e 00 -1.11695124e 01]
[-4.92313307e 01 -4.21097213e 01]
[-7.22908927e 01 -7.91525111e 01]]
The expected output
X1 X2 y1 y2
1.62434536 second column fo the X
0.04221375
-0.52817175
1.74481176
1.13376944
0.86540763
1.46210794
0.3190391
-0.3224172
-0.17242821
CodePudding user response:
You can use pandas concat
. I am not sure about the name you wanted like X1
, X2
and so on.
X, y = make_regression(n_samples=10, n_features=2, n_informative=2, n_targets=2, random_state=1, noise=0.5)
print(X, y)
full_df = pd.concat([pd.DataFrame(X),pd.DataFrame(y)],axis=1, ignore_index=True)
full_df.to_csv('test_data.csv', index=False, header=True)
print(full_df)
X and y from the function
[[ 1.62434536 -0.61175641]
[ 0.04221375 0.58281521]
[-0.52817175 -1.07296862]
[ 1.74481176 -0.7612069 ]
[ 1.13376944 -1.09989127]
[ 0.86540763 -2.3015387 ]
[ 1.46210794 -2.06014071]
[ 0.3190391 -0.24937038]
[-0.3224172 -0.38405435]
[-0.17242821 -0.87785842]] [[ 7.08380317e 01 -1.49989469e-01]
[ 4.25574119e 01 5.08213909e 01]
[-1.10263835e 02 -1.06685245e 02]
[ 6.81167780e 01 -8.67912040e 00]
[ 3.76517652e 00 -5.56565286e 01]
[-9.82592158e 01 -1.64522187e 02]
[-4.06045719e 01 -1.25819174e 02]
[ 4.61069914e 00 -1.11695124e 01]
[-4.92313307e 01 -4.21097213e 01]
[-7.22908927e 01 -7.91525111e 01]]
Output after concat
0 1 2 3
0 1.624345 -0.611756 70.838032 -0.149989
1 0.042214 0.582815 42.557412 50.821391
2 -0.528172 -1.072969 -110.263835 -106.685245
3 1.744812 -0.761207 68.116778 -8.679120
4 1.133769 -1.099891 3.765177 -55.656529
5 0.865408 -2.301539 -98.259216 -164.522187
6 1.462108 -2.060141 -40.604572 -125.819174
7 0.319039 -0.249370 4.610699 -11.169512
8 -0.322417 -0.384054 -49.231331 -42.109721
9 -0.172428 -0.877858 -72.290893 -79.152511
CodePudding user response:
Based on your update:
You can slice numpy array, just like pandas.
import numpy as np
import pandas as pd
x = np.array([np.arange(0, 10), np.arange(10,20)])
x.reshape(10, 2)
pd.DataFrame({
'x0': x[0, :],
'x1': x[1, :]
})