Curator’s notes
# 🛡️ HostReveal - Cyber Threat Investigation & Hosting Unmasking Tool
## ✨ Inspiration
The rise in cybercrime and the increased use of CDN services like Cloudflare, Incapsula, and Akamai to mask real hosting infrastructure have made it extremely difficult for cybersecurity teams to trace malicious websites. Traditional WHOIS and DNS tools fail when criminals hide behind layers of anonymization.
The inspiration for HostReveal came from a real-world problem faced by the Madhya Pradesh Police, who needed a reliable way to unmask hidden hosting providers. This challenge sparked the creation of a platform capable of digging deep into masked infrastructures using cyber forensics and machine learning.
* * *
## 🔍 What it does
HostReveal is an AI-powered cybersecurity tool that:
- Unmasks hidden hosting providers behind proxy/CDN services
- Analyzes domain, DNS, network paths, and SSL certificate data
- Performs deep packet inspection using tools like Zeek and Suricata
- Correlates threat intelligence from platforms like OTX, Shodan, and MISP
- Applies machine learning to classify risky infrastructure and detect anomalies
- Generates automated forensic reports and provides visual dashboards via Streamlit
* * *
## 🧱 How we built it
The project is built using modular Python-based components, with Streamlit for the frontend dashboard.
- **Frontend:** Streamlit multi-page app with dark mode and interactive visualizations using Plotly and Folium.
- **Backend:** Python modules that handle DNS/WHOIS analysis, SSL inspection, network scanning, and packet capture.
- **Machine Learning:** Scikit-learn, TensorFlow, and Prophet for classification, clustering, anomaly detection, and forecasting.
- **Threat Intelligence APIs:** Integrated with Shodan, AlienVault OTX, MISP, and Censys for real-time correlation.
- **Packet Analysis Tools:** Zeek, Suricata, TCPFlow for advanced forensic data extraction and analysis.
- **Deployment:** Local environment using virtualenv, with optional system-wide tools like nmap, tshark, and zeek.
* * *
## 🧩 Challenges we ran into
- **Bypassing CDN protections:** Finding reliable methods to trace origin servers behind Cloudflare and similar services.
- **Packet capture parsing:** Aligning structured output from Zeek and Suricata for ML modeling was complex.
- **Threat intelligence integration:** Standardizing data formats and managing API rate limits across threat feeds.
- **Real-time performance:** Balancing deep inspection with usability and system responsiveness in a local environment.
* * *

## 🏆 Accomplishments that we're proud of
- Built a fully functional end-to-end forensic investigation platform
- Successfully used in real-world cybercrime cases during a law enforcement innovation challenge
- Developed a powerful machine learning pipeline for clustering and risk classification of IP infrastructure
- Designed a clean, user-friendly dashboard that visualizes network and threat data for faster decision-making
* * *
## 📚 What we learned
- Deep understanding of DNS protocols, SSL certificate structures, and network-level traffic
- Practical applications of ML in cyber forensics (clustering, anomaly detection, time-series forecasting)
- The importance of integrating multiple open-source tools (Zeek, Suricata, Shodan) in a cohesive pipeline
- Building scalable cybersecurity platforms that are both functional and accessible to investigators
* * *
## 🚀 What's next for HostReveal
- Add PDF and HTML auto-reporting with export options
- Launch real-time monitoring mode for live incident response
- Build Slack/webhook integration for instant alerts
- Expand to a cloud-based SaaS version with API-first architecture
- Collaborate with CERTs and government bodies to scale adoption
- Develop mobile companion tools for on-the-go forensic snapshots
* * *