Home > Enterprise >  DataFrame order shuffling itself
DataFrame order shuffling itself

Time:02-25

I'm trying to workout why my dataframe changes its order once its converted into an array. Below is my code:

header_list = ["output", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15" ,"16", "17", "18", "19", "20",
               "21", "22", "23", "24", "25", "26", "27", "28", "29", "30"]
df = pd.read_csv(('data.csv'), names = header_list)

#Splitting data 70/30 for training and testing sets
trainingdata = df.sample(frac=0.7)

#assigning Y to be the first column, and X as the rest
X = trainingdata.iloc[:,1:].to_numpy()
Y = trainingdata.iloc[:,0].to_numpy().reshape(-1, 1)
print(trainingdata)

output:

   output         1         2  ...        28        29        30
12        0  0.267358  0.373690  ...  0.379725  0.130298  0.195592
27        1  0.313739  0.506595  ...  0.456701  0.375517  0.157156
450       0  0.181693  0.490362  ...  0.112165  0.294500  0.139184
440       0  0.033603  0.531958  ...  0.171821  0.241474  0.338187
54        0  0.197312  0.113967  ...  0.189210  0.255076  0.083169
..      ...       ...       ...  ...       ...       ...       ...
20        1  0.519144  0.348326  ...  0.407216  0.653854  0.039814
231       1  0.428274  0.196145  ...  0.680756  0.286615  0.237439
55        0  0.291968  0.190396  ...  0.334089  0.450227  0.205234
159       1  0.410762  0.456206  ...  0.846048  0.337473  0.307359
117       0  0.232335  0.292188  ...  0.391065  0.361128  0.187656

You can see my index column is in a random order, where my original dataframe is in numerical order, have I performed my syntax wrong here to cause this?

CodePudding user response:

This is coming from the sample operation in pandas. By default it performs a selection on random rows/columns from your dataframe.

Read the documentation about it here.

If you want to have the same selection on every execution of your code (reproducibility) you can use the random_state option.

  • Related