HOME
Detection and
categorization of
malicious URLs


Identify and classify malicious URLs.

Learn more...

URL dataset (ISCX-URL2016)

The Web has long become a major platform for online criminal activities. URLs are used as the main vehicle in this domain. To counter this issues security community focused its efforts on developing techniques for identifying malicious URLs.
  • We study mainly five different types of URLs:
  • 1. Benign
  • 2. Spam
  • 3. Phishing
  • 4. Malware
  • 5. Defacement

Challenge

Lack of description on individual features.

Difficult to determine the correlations because of large number of features.

The dataset contains Null, NaN and Infinity values.

Model

The dataset was trained using both supervised and unsupervised models.

Unsupervised model: Isolation Forest for unsupervised anomaly detection

Supervised models: Random Forest, Decision Tree, Logistic Regression, AdaBoost and
Naive Bayes.

To the code

Result

See the results for each model:

Model Accuracy F1-Score
Random Forest 95.04% 0.9507
Decision Tree 91.95% 0.9202
Logistic Regression 81.66% 0.8151
ADA Boost 68.90% 0.6917
Naive Bayes 61.87% 0.6104

Presentation