One of the most important steps in machine learning is model validation. It's essentially a test run to determine how well the model will perform in the real world. Should the model display unsatisfactory performance during this process, programmers will need to take what they've learned from the test and use it to tune the model's hyperparameters and adjust its training data.
Model validation occurs immediately after a machine learning model has been fully trained. It typically presents the machine learning model with data it has never directly encountered before. The core idea is that if the model is properly trained, it should be able to generalize to the new data.
The model validation process can be divided into one of two categories based on the source of its test data:
Additionally, there are many different model validation techniques — each one applies the test data in a different way.
Model validation is important for the same reason as any performance or quality assurance test. It determines whether or not a machine learning model does what it's intended to do outside the confines of its training environment. An unvalidated machine learning model is essentially an unknown quantity — there's no real way of knowing if it's able to accurately and effectively generalize on unseen data.
Additionally, model validation helps programmers optimize and fine-tune their machine learning model, while also identifying potential problems before the model moves ahead to final testing. It also allows development teams to compare the performance of different models, as well as models trained on different data sets, in order to identify which ones would be most effective at fulfilling their goals. Lastly, model validation may be carried out by a third party in the event that the model has to adhere to certain regulatory requirements.
It's important to differentiate between model validation and model testing, as well — the test set is reserved for the final, optimized model.
There are several different types of machine learning model, and each one has multiple validation requirements depending on purpose, use case, and dataset.
The different types of machine learning model validation include: