K Nearest Neighbors – R vs Python

K-Nearest Neighbors – a simple example using R and Python

The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems.
KNN stores all available cases and classifies (or gives expected values of) new cases based on a similarity measure. Here we look at a simple example using both R and Python.

Data Description: A bank possesses demographic and transactional data of its loan customers. If the bank has a robust model to predict defaulters, it can undertake better resource allocation.

Objective: To predict whether the customer applying for a loan will be a defaulter

KNN in R :

Importing data and removing unwanted variables

bankloan<-read.csv("BANK LOAN KNN.csv",header=T)
bankloan2<-subset(bankloan,select=c(-AGE,-SN,-DEFAULTER))

head(bankloan2)
##   EMPLOY ADDRESS DEBTINC CREDDEBT OTHDEBT
## 1     17      12     9.3    11.36    5.01
## 2      2       0    17.3     1.79    3.06
## 3     12      11     3.6     0.13    1.24
## 4      3       4    24.4     1.36    3.28
## 5     24      14    10.0     3.93    2.47
## 6      6       9    16.3     1.72    3.01

Scaling variables

bankloan3<-scale(bankloan2)

head(bankloan3)
##       EMPLOY    ADDRESS    DEBTINC      CREDDEBT     OTHDEBT
## 1  1.5656796  0.6216799 -0.2881684  3.8774339687  0.51519694
## 2 -0.8239988 -1.1852951  0.7889154  0.0289356115 -0.02571385
## 3  0.7691201  0.4710987 -1.0555906 -0.6386200074 -0.53056393
## 4 -0.6646869 -0.5829701  1.7448273 -0.1439854223  0.03531198
## 5  2.6808628  0.9228424 -0.1939235  0.8895193612 -0.18937404
## 6 -0.1867512  0.1699362  0.6542799  0.0007856758 -0.03958336

Creating training and testing data sets

library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
index<-createDataPartition(bankloan$SN,p=0.7,list=FALSE)
head(index)
##      Resample1
## [1,]         3
## [2,]         4
## [3,]         5
## [4,]         7
## [5,]         8
## [6,]        10
## [6,]        10
traindata<-bankloan3[index,]
testdata<-bankloan3[-index,]

dim(traindata)
## [1] 273   5
dim(testdata)
## [1] 116   5

Creating class vectors

Ytrain<-bankloan$DEFAULTER[index]

Ytest<-bankloan$DEFAULTER[-index]

KNN classification (Contunuous predictors)

knn() in package “class” undertakes k-nearest neighbour classification testing set using training data. Distance is calculated by Euclidean measure, and the classification is decided by majority vote, with ties broken at random.

library(class)

model<-knn(traindata,testdata,k=20,cl=Ytrain)

KNN in Python :

Here the same BANK LOAN DATA is used.

import numpy as np
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import confusion_matrix, f1_score, precision_score, recall_score, accuracy_score,roc_curve, roc_auc_score

Importing data and removing unwanted variables

bankloan = pd.read_csv("BANK LOAN KNN.csv")
bankloan1 = bankloan.drop(['SN','AGE'], axis = 1)

bankloan1.head()
##    EMPLOY  ADDRESS  DEBTINC  CREDDEBT  OTHDEBT  DEFAULTER
## 0      17       12      9.3     11.36     5.01          1
## 1       2        0     17.3      1.79     3.06          1
## 2      12       11      3.6      0.13     1.24          0
## 3       3        4     24.4      1.36     3.28          1
## 4      24       14     10.0      3.93     2.47          0

Creating training and testing data sets

X = bankloan1.loc[:,bankloan1.columns != 'DEFAULTER']
y = bankloan1.loc[:, 'DEFAULTER']

X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                   test_size=0.30, 
                                                   random_state = 999)

Preparing/Scaling variables

scaler = StandardScaler()
scaler.fit(X_train)
## StandardScaler(copy=True, with_mean=True, with_std=True)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Building the KNN Classifier (Continuous Predictors)
KNeighborsClassifier() from sklearn.neighbors undertakes k-nearest neighbour classification testing set using training data

KNNclassifier = KNeighborsClassifier(n_neighbors = 
                                     int(np.sqrt(len(X)).round()))

KNNclassifier.fit(X_train, y_train)
## KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
##                      metric_params=None, n_jobs=None, n_neighbors=20, p=2,
##                      weights='uniform')

This tutorial lesson is taken from the Postgraduate Diploma in Data Science .