Choose Random Validation Data Set
Given a numpy array consisting of data which has been generated for ongoing time from a simulation. Based on this I'm using tensorflow and keras to train a neural network and my qu
Solution 1:
As you mentioned, Keras simply takes the last x samples of the dataset, so if you want to keep using it, you need to shuffle your dataset in advance.
Or, your can simply use the sklearn train_test_split() method:
x_train, x_valid, y_train, y_valid = sklearn.model_selection.train_test_split(x, y, test_size=0.2)
This method has an argument named "shuffle" which determines whether to shuffle the data prior to the split (it is set on True by default).
However, a better split of the data would be by using the "stratify" argument, which will provide a similar distribution of labels among the validation and training sets:
x_train, x_test, y_train, y_test = train_test_split(x, y,
test_size=0.2,
random_state=0,
stratify=y)
Post a Comment for "Choose Random Validation Data Set"