Choose Random Validation Data Set

June 11, 2024 Post a Comment

Given a numpy array consisting of data which has been generated for ongoing time from a simulation. Based on this I'm using tensorflow and keras to train a neural network and my qu

Solution 1:

As you mentioned, Keras simply takes the last x samples of the dataset, so if you want to keep using it, you need to shuffle your dataset in advance.

Or, your can simply use the sklearn train_test_split() method:

x_train, x_valid, y_train, y_valid = sklearn.model_selection.train_test_split(x, y, test_size=0.2)

This method has an argument named "shuffle" which determines whether to shuffle the data prior to the split (it is set on True by default).

However, a better split of the data would be by using the "stratify" argument, which will provide a similar distribution of labels among the validation and training sets:

x_train, x_test, y_train, y_test = train_test_split(x, y,
                                                    test_size=0.2,
                                                    random_state=0,
                                                    stratify=y)

lacucinadiadine

Choose Random Validation Data Set

Solution 1:

Post a Comment for "Choose Random Validation Data Set"

Widget HTML #3