Today, I have begun by loading the dataset into a Python Data Frame and performed a linear regression analysis. Specifically, I aimed to predict the percentage of diabetic individuals based on the percentages of obesity and inactivity within each county. The R-squared value, which measures the goodness of fit of our model, assesses how well our linear regression model explained the variance in diabetes rates.
Upon analyzing the initial linear regression model, I found that the R-squared value was not very high. To improve the model’s performance, I explored a polynomial regression approach, allowing for more complex relationships between the variables. This led to a higher R-squared value, suggesting that a polynomial regression model might better capture the underlying trends in the data.