First of all, these are great questions. Let me answer each of them sequentially.
- In this article, I demonstrated K values up to 40 and claimed to get a minimum error of 0.59 at K=37. If you test K values up to 25, then you may get a minimum error of 0.63 at K=22, which will beneficial to decide that K=22 is the optimal value in that range(1 to 25).
- If I extend the analysis to K=100, then again, we get the minimum error at k=37 only. Why? Because there is not any statistical method to find an optimal value of K, just remember that at the end model evaluation, we get the optimal K value as the square root of N(number of sample points). This idea is not written in any book just concluded from the experiences.
- Setting K value equal to the number of sample points is computationally very expensive, and you will get the same optimal K value as I derived in the above answer.
- Whenever you increase the test size, then your model will get fewer data to train, and so on, you will get unexpected results. In the given code, we set test_size = 0.2, and we got K=37 as an optimum value, but if you change test size, then optimal K value will definitely change.
I suggest applying KNN on more industry-based problem statements to extract more meaningful insights.
I hope this will help you.