SOLVING THE PROBLEM OF DETECTING PHISHING WEBSITES USING ENSEMBLE LEARNING MODELS

Journal Title: Scientific Journal of Astana IT University - Year 2022, Vol 12, Issue 12

Abstract

Due to the popularity of the easiest way to obtain personal information among attackers, phishing detection is becoming a popular area for research aimed at countering the implementation of such attacks. Malicious website detection is essential to prevent the spread of malware and protect end users from victims. Unfortunately, malicious URL detection still needs to be better understood due to a lack of features and inaccurate classification. Possible sources were examined in order to investigate the subject. Based on the collected information from previous studies, this study is devoted to solving the problem of detecting phishing websites using Ensemble Learning. The aim of the work is to choose the most optimal algorithm for classifying phishing websites using gradient boosting algorithms. AdaBoost, CatBoost, and Gradient Boosting Classifier were chosen as Ensemble Learning algorithms and were used to improve the efficiency of classifiers. Practical studies of the parameters of each algorithm for finding the optimal classification model are given. Research and experiments were carried out on a dataset containing information extracted from the contents of a URL: main URL, domain, directory, and file. A thorough Exploratory Data Analysis (EDA) was carried out, as a result of which the main dependencies and patterns of determining phishing resources were identified using correlation analysis. ROC AUC Score was chosen as an evaluation metric for the algorithms. The best result for predicting phishing websites was demonstrated by the AdaBoost Classifier algorithm, with an average ROC AUC score of 99%. The results of the experiments were illustrated in the form of graphs and tables.

Authors and Affiliations

Dinara Kaibassova, Margulan Nurtay, Ardak Tau, Mira Kissina

Keywords

Related Articles

SYSTEMATIC DATA PROCUREMENT IN AN OWL-EMBEDDED INFORMATION AND ANALYTICAL FRAMEWORK FOR THE MONITORING OF WATER RESOURCES IN THE ILE-BALKHASH BASIN

The world is facing an escalating water shortage crisis, with dire consequences for ecosystems, human health, and socio-economic development. This article explores the multifaceted nature of the water shortage problem of...

FACTORS FOR ACHIEVING LEARNING OUTCOMES: OVERVIEW OF ASTANA IT UNIVERSITY’S EXPERIENCE

The article gives an overview of Astana IT University’s (AITU) experience in performing the teaching conditions for achieving the learning outcomes. The introduction of a competency-based approach to the formation and...

MATHEMATICAL SUPPORT OF THE INFORMATION SYSTEM FOR DECISION SUPPORT IN THE SPHERE OF HEALTHCARE

The relevance of the topic is that currently modern medical information systems are aimed at providing management, economic and in some cases medical practice in the collection and processing of anamnestic data, includ...

SYSTEMATIZATION OF INTERNATIONAL AND DOMESTIC EXPERIENCE IN PROJECT MANAGEMENT AIMED AT ADAPTING PUBLIC-PRIVATE PARTNERSHIPS TO THE IMPLEMENTATION OF SUSTAINABLE ENERGY DEVELOPMENT PROGRAMS

European countries are recognized leaders in the use of public-private partnerships in project management for large-scale infrastructure projects, including those that contribute to energy efficiency in various sectors...

EXPERT-ANALYTICAL MODEL OF MANAGEMENT QUALITY ASSESSMENT AT A CONSTRUCTION ENTERPRISE

This article develops an expert-analytical model for assessing the quality of process-oriented management of construction companies. The model differs in a two-tier approach to object evaluation by pre-evaluating the c...

Download PDF file
  • EP ID EP713377
  • DOI 10.37943/12OYRS4391
  • Views 57
  • Downloads 0

How To Cite

Dinara Kaibassova, Margulan Nurtay, Ardak Tau, Mira Kissina (2022). SOLVING THE PROBLEM OF DETECTING PHISHING WEBSITES USING ENSEMBLE LEARNING MODELS. Scientific Journal of Astana IT University, 12(12), -. https://europub.co.uk./articles/-A-713377