DESIGN AND IMPLEMENTATION OF A MACHINE LEARNINGFRAMEWORK FOR BLOCKED URL DETECTION

Authors

  • V.NARESH Author

Keywords:

Cyber security, Cybercrime, Malicious URL, Machine learning, Deep learning, Character embedding

Abstract

People frequently host harmful content, such as phishing schemes, spam, malicious advertisements, and drive-by vulnerabilities, on malicious URLs, which are also known as malicious websites. Streamlining and clarifying the inquiry. Patience, please, for the following phase. Past research has made use of regular expressions, signature matching, and blacklisting. Neither previously discovered dangerous URLs nor variants of those URLs may be located using these methods. To resolve this matter, a machine learning-based solution could be suggested. This method relies on a large body of work in feature engineering and feature representation of security elements, such as URLs, which requires substantial investigation. Additionally, feature engineering and feature representation tools must be continuously updated to support both the old and new versions of URLs. With the use of deep learning, AI systems can already compete with humans in a wide range of tasks. They are so good at computer vision that they can outperform humans on certain tasks. They are capable of automatically extracting the most useful feature photos from provided raw data. In order to make raw URLs usable and translateable in cybersecurity apps, Deep URL Detect (DUD) use character embedding. A more sophisticated method of numerically representing letters is character-level embedding in natural language processing (NLP). After character-level embedding features are included, hidden layers in deep learning systems utilize a nonlinear activation function to determine if a URL might be hazardous. In order to identify fraudulent URLs, this study examines various deep learning-based advanced character-level embedding algorithms. The optimal deep learning character-level embedding model is determined via a battery of experiments. Every test is executed 500 times at a learning rate of 0.001, and it employs multiple deep learning-based characterlevel embedding models. DUD outperforms all other deep learning-based character-level embedding approaches in terms of speed and performance across all test scenarios. There was an improvement over n-gram representations when using deep learning systems based on character-level embedding models. This is because the URL's order and connections are preserved by the integration.

Downloads

Download data is not yet available.

Author Biography

  • V.NARESH

    Lecturer in Computers,
    KIMS PG COLLEGE, KARIMNAGAR, TG.

Downloads

Published

2025-12-28