5 Fold Cross Validation Method on Diabetes, Obesity and Inactivity Data Sets

Hi,

During today’s class, I learned about K-Fold cross validation using polynomial models to analyze Diabetes, Obesity, and Inactivity data.

 

There are 354 data points in all three data sets (Diabetes, Obesity, and Inactivity). And, then we figure out the training error for the whole thing. As training progresses, the error generally decreases as the model learns to fit the training data better. Therefore, the training error curve should steadily decrease.

 

Using the K-Flod cross validation method, we find out the Test error from the 354 data points. For a better balance, we split the data into 5 folds. The K fold value might range from 5 to 10. If we choose a smaller K fold value, this leads to a larger portion of our data. This can lead to higher bias because the model will be trained on smaller subsets of data. On the other hand, if we use a very large K fold value, we’ll have many folds, but each fold will contain very few data points, which can lead to higher variance in our model performance. Hence, a value between 5-10 is a balanced value. Since we are using 5 folds, the data groups will be separated into 71,71,71,71 and 70.

 

The dataset’s repetitive or duplicate data points must then be labeled. As a result, this CV model effectively fits all 354 data points. Moving on to the distinct data groups now. In the first group, we randomly select 71 data points to serve as test data, while the remaining 4 groups (71, 71, 71, and 70) serve as first-iteration TRN data. We next calculate the MSE for the test data. In the same manner, we perform the task for 2nd group TEST data and the remaining 4 groups (71, 71, 71, and 70) as TRN data for the second iteration and calculate the MSE for the TEST data. The same process is used for all 5 iterations, and each MSE is obtained. This way we could see the test error starts increasing once the model starts to overfit.

 

At the end of the class, I had a question with the instructor regarding other commonly used Stratified cross validation method, which is quite like the K-Fold cross validation method. I have dropped an email regarding the same and waiting for the insights from the Instructor.

 

Thank You!!

Leave a Reply

Your email address will not be published. Required fields are marked *