First of all, these are great questions. Let me answer each of them sequentially.

  1. In this article, I demonstrated K values up to 40 and claimed to get a minimum error of 0.59 at K=37. If you test K values up to 25, then you may get a minimum error of 0.63 at K=22, which will beneficial to decide that K=22 is the optimal value in that range(1 to 25).
  2. If I extend the analysis to K=100, then again, we get the minimum error at k=37 only. Why? Because there is not any statistical method to find an optimal value of K, just remember that at the end model evaluation, we get the optimal K value as the square root of N(number of sample points). This idea is not written in any book just concluded from the experiences.
  3. Setting K value equal to the number of sample points is computationally very expensive, and you will get the same optimal K value as I derived in the above answer.
  4. Whenever you increase the test size, then your model will get fewer data to train, and so on, you will get unexpected results. In the given code, we set test_size = 0.2, and we got K=37 as an optimum value, but if you change test size, then optimal K value will definitely change.

I suggest applying KNN on more industry-based problem statements to extract more meaningful insights.

I hope this will help you.

Machine Learning | Data Science Practitioner, Connect with me on LinkedIn - https://linkedin.com/in/amey23/

Machine Learning | Data Science Practitioner, Connect with me on LinkedIn - https://linkedin.com/in/amey23/