trying to complete this function. Any help would be appreciated. This is my work so far:
Should take a 2-d numpy array as input:
array([[ 1.961e 03, 2.263e-02],
[ 1.962e 03, 1.420e-02],
[ 1.963e 03, 8.360e-03],
[ 1.964e 03, 5.940e-03],
[ 1.965e 03, 5.750e-03],
[ 1.966e 03, 6.190e-03],
[ 1.967e 03, 5.890e-03],
[ 1.968e 03, 5.700e-03],
[ 1.969e 03, 5.820e-03],
[ 1.970e 03, 5.740e-03]...
Should return two tuples of the form (X_train, y_train), (X_test, y_test). (X_train, y_train) should consist of data from even years and (X_test, y_test) should consist of data from odd years.
My code:
def feat_resp_split(arr):
X_odd, y_odd = arr[1::2,::].T
X_even, y_even = arr[::2,::].T
X_even_train, y_even_train, X_even_test, y_even_test = train_test_split(X_even, y_even, test_size = 0.2, random_state = 42)
X_odd_train, y_odd_train, X_odd_test, y_odd_test = train_test_split(X_odd, y_odd, test_size = 0.2, random_state = 42)
return (X_even_train, y_even_train), (X_odd_test, y_odd_test)
Input code:
feat_resp_split(data)
My output:
((array([2003., 1961., 2013., 1987., 1991., 1983., 1995., 1963., 1969.,
1971., 1965., 2009., 1967., 2007., 2011., 1997., 2017., 2001.,
1975., 1981., 1989., 1999., 1973.]),
array([2015., 1993., 1985., 2005., 1977., 1979.])),
(array([ 0.0358 , 0.00801, 0.01021, -0.01219, 0.05591, 0.00594,
0.00574, 0.00673, 0.00619, 0.05787, 0.00131, 0.0057 ,
0.00589, 0.00213, 0.02137, 0.00461, 0.02254, -0.00117,
0.01285, 0.0183 , 0.02076, 0.00473]),
array([ 0.00193, 0.00513, -0.00436, 0.01773, 0.0142 , -0.00606])))
Expected output:
X_train == array([1962., 1964., 1966., 1968., 1970., 1972., 1974., 1976., 1978.,
1980., 1982., 1984., 1986., 1988., 1990., 1992., 1994., 1996.,
1998., 2000., 2002., 2004., 2006., 2008., 2010., 2012., 2014.,
2016.])
y_train == array([ 0.01419604, 0.00594409, 0.00618898, 0.00570149, 0.00573851,
0.00672948, 0.00473084, -0.00117052, -0.00435676, 0.00193398,
0.01284528, 0.01020884, -0.00606099, -0.01219414, 0.01830187,
0.05590975, 0.05787267, 0.03580499, 0.02136897, 0.02076288,
0.02254085, 0.01772885, 0.00800752, 0.00131397, 0.00212906,
0.00513459, 0.00589222, 0.00460988])
X_test == array([1961., 1963., 1965., 1967., 1969., 1971., 1973., 1975., 1977.,
1979., 1981., 1983., 1985., 1987., 1989., 1991., 1993., 1995.,
1997., 1999., 2001., 2003., 2005., 2007., 2009., 2011., 2013.,
2015., 2017.])
y_test == array([ 0.02263378, 0.00835927, 0.00575116, 0.00589102, 0.00582331,
0.00638301, 0.00673463, 0.00213125, -0.0036312 , -0.00204649,
0.00783746, 0.01395387, 0.00302374, -0.01294617, -0.0007695 ,
0.03979147, 0.0625632 , 0.04724902, 0.02705529, 0.01979903,
0.02250889, 0.02131758, 0.01310552, 0.00384798, 0.00098665,
0.00377696, 0.00594675, 0.00526037, 0.00421667])
CodePudding user response:
Given your example data
data = np.array([[ 1.961e 03, 2.263e-02],
[ 1.962e 03, 1.420e-02],
[ 1.963e 03, 8.360e-03],
[ 1.964e 03, 5.940e-03],
[ 1.965e 03, 5.750e-03],
[ 1.966e 03, 6.190e-03],
[ 1.967e 03, 5.890e-03],
[ 1.968e 03, 5.700e-03],
[ 1.969e 03, 5.820e-03],
[ 1.970e 03, 5.740e-03]])
For your desired output, you only need
X_train = data[1::2,0]
y_train = data[1::2,1]
X_test = data[::2,0]
y_test = data[::2,1]
I do not understand why you want use sklearn.train_test_split
. Do you want to further split the odd and even into 80/20% samples?
CodePudding user response:
Since you are using your own criteria to split the dataset into train and test, you no longer need the train_test_split
function:
def feat_resp_split(arr):
X_odd, y_odd = arr[1::2].T
X_even, y_even = arr[::2].T
return (X_even, y_even), (X_odd, y_odd)
You can call the function in this way:
(X_train, y_train), (X_test, y_test) = feat_resp_split(arr)
Note that you are taking as test the even positions, not the even years. To correct this behavior you can proceed like that:
def feat_resp_split(arr):
even_mask = (arr[:,0]%2==0) #this means True if even, False if odd
X_odd, y_odd = arr[~even_mask].T
X_even, y_even = arr[even_mask].T
return (X_even, y_even), (X_odd, y_odd)