K-Nearest Neighbors(KNN algorithm)

0

K-Nearest Neighbors (KNN) Algorithm

The K-Nearest Neighbors (KNN) supervised algorithm is one of the simplest and most effective machine learning algorithms for both classification and regression tasks.

It relies on the distance metrics to determine the closeness of data.It predicts the class of a given point based on the majority class of k it’s nearest neighbours.

k is the positive integer , typically not very large , if k-1 , then the new data is just assigned to the class of it’s nearest neighbor


How Does KNN Work?

KNN operates on a simple principle: data points that are close together are more likely to share similar characteristics.
The algorithm determines the class or value of a new data point by analysing the k nearest neighbours around it, where k is a user-defined parameter.

  1. For Classification:
    • The new data point is assigned to the class most common among its kk nearest neighbors (majority voting).
  2. For Regression:
    • The predicted value is the average (or weighted average) of the target values of its kk nearest neighbors.

KNN Classification Example

Scenario:

A company wants to predict whether a customer will Buy or Not Buy based on their Age and Salary.

Dataset (Training Data):

Age Salary Class (Buy)
25 40000 Yes
30 50000 No
35 60000 Yes
40 70000 No
50 80000 Yes

Query Point:

  • Age: 37
  • Salary: 65000

Step 1: Calculate distances using Euclidean formula

The Euclidean distance is given by:

KNN Classification Example (Continued)

Let’s continue with the dataset and calculate the Euclidean distances between the query point (37,65000)(37, 65000) and each training data point:

Step 1: Calculate distances using Euclidean formula

The Euclidean distance formula is:

Here, x1 and x2 represent the Age values, while y1 and y2 represent the Salary values.

Step 2: Find the k nearest neighbors

Let’s assume k=3 . The 3 nearest neighbours are:

  1. Point 3 (35,60000)
  2. Point 4 (40,70000)
  3. Point 2 (30,50000)

Step 3: Perform majority voting

Check the class labels of the nearest neighbours:

Out of the 3 nearest neighbors, 2 have the class No, and 1 has the class Yes. Therefore, the predicted class for the query point (37,65000) is:

                                      Predicted Class: No

 

Regression Example: Predicting House Prices

Scenario

A real estate agency wants to predict the price of a house based on its size (sq. ft) and number of bedrooms using historical data.

Dataset (Training Data)

Size (sq. ft) Bedrooms Price (in $1000s)
1500 3 200
1800 4 250
2400 4 300
3000 5 450
3500 5 500

Query Point

We want to predict the price of a house with:

  • Size: 2000 sq. ft
  • Bedrooms: 4

Step 1: Calculate the distances

Use the Euclidean distance to calculate the distance between the query point (2000,4)and each data point:


Step 2: Select the kk nearest neighbors

Let k=3 nearest neighbours are:

  1. Point 2: (1800,4,250)
  2. Point 3: (2400,4,300)
  3. Point 1: (1500,3,200)

Step 3: Calculate the predicted price

The predicted price is the average of the prices of the 3 nearest neighbors:


Predicted Result

The predicted price for the house with a size of 2000 sq. ft and 4 bedrooms is $250,000.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to Top