PHISHING URL DETECTION TOOLS
Faculty
Department
Year of Publication
Publication Type
Abstract
Phishing attacks are one of the most common and dangerous cybersecurity threat today, with attackers using techniques that are getting sophisticated daily to deceive users and get access to their sensitive information. This study shows the development and implementation of a machine learning based phishing URL detection tool designed to identify malicious URLs and do so with high accuracy. This research addresses the growing challenge of detecting phishing websites by analyzing URL characteristics and patterns that distinguish legitimate sites from fraudulent ones. Utilizing a comprehensive dataset of over 10,000 URLs (comprising both phishing and legitimate websites), this study implements multiple machine learning algorithms including Random Forest, Support Vector Machines (SVM), and Gradient Boosting to classify URLs. The system extracts 30 distinct features from URLs, including lexical properties, domain-based characteristics, and third-party service indicators. Feature engineering techniques were applied to optimize model performance, with priority given to handling imbalanced datasets through Synthetic Minority Over-sampling Technique(SMOTE ). The results shows that the Random Forest classifier achieved the highest accuracy of 96.8%, with precision and recall scores of 95.2% and 97.1% respectively. The Gradient Boosting model closely followed with 95.9% accuracy, while the SVM model achieved 92.4% accuracy. Cross-validation techniques were used to make sure the model is robust and prevent overfitting. Feature importance analysis revealed that URL length, presence of suspicious keywords, domain age, and SSL certificate status were among the most significant predictors of phishing attempts. To validate practical applicability, a web-based detection tool was developed using Flask framework, enabling real-time URL scanning and classification. The system incorporates a user-friendly interface that provides instant feedback on URL legitimacy, along with detailed risk analysis and security recommendations. Performance testing also verified an average response time below 200 milliseconds per analysis for URL, making the tool practical for real-world deployment. This research contributes to the study of cybersecurity with the presentation of an efficient, automated phishing detection system that can be employed with web browsers, email clients, or independently
Supervisor(s)
co-supervisor


