Skip to content
SubCrawl is a framework developed by
However, if this UI is not sufficient for the subsequent evaluation of the data, the MISP storage module can be activated alternatively or additionally. The corresponding settings must be made in config.yml
under the MISP
section.
The following two commands are enough to clone the GIT repository, create the Docker container and start it directly. Afterwards the web UI can be reached at the address https://localhost:8000/
. Please note, once the
To add additional YARA rules, you can add .YAR files to the yara-rules folder, and then include the rule file by adding an include statement to combined-rules.yar.
ClamAV
The ClamAV processing module is used to scan HTTP response content during scanning with ClamAV. If a match is found, it is provided to the various output modules. To invoke this processing module, provide the value ClamAVProcessing as a processing module argument. For example, the following command will load the ClamAV processing module and produce output to the console via the ConsoleStorage storage module.
python3 subcrawl.py -p ClamAVProcessing -s ConsoleStorage
Sample output:
To utilize this module, ClamAV must be installed. From a terminal, install ClamAV using the APT package manager:
$ sudo apt-get install clamav-daemon clamav-freshclam clamav-unofficial-sigs
Once installed, the ClamAV update service should already be running. However, if you want to manually update using freshclam, ensure that the service is stopped:
sudo systemctl stop clamav-freshclam.service
And then run freshclam manually:
Finally, check the status of the ClamAV service:
$ sudo systemctl status clamav-daemon.service
If the service is not running, you can use systemctl to start it:
$ sudo systemctl start clamav-daemon.service
Payload
The Payload processing module is used to identify HTTP response content using the libmagic library. Additionally, SubCrawl can be configured to save content of interest, such as PE files or archives. To invoke this processing module, provide the value PayloadProcessing as a processing module argument. For example, the following command will load the Payload processing module and produce output to the console:
python3 subcrawl.py -p PayloadProcessing -s ConsoleStorage
There are no additional dependencies for this module.
Sample output:
Storage Modules
Storage modules are called by the SubCrawl engine after all URLs from the queue have been scanned. They were designed with two objectives in mind. First, to obtain the results from scanning immediately after finishing the scan queue and secondly to enable long-term storage and analysis. Therefore we not only implemented a ConsoleStorage module but also an integration for MISP and an SQLite storage module.
Console
To quickly analyse results directly after scanning URLs, a well-formatted output is printed to the console. This output is best suited for when SubCrawl is used in run-once mode. While this approach worked well for scanning single domains or generating quick output, it is unwieldy for long-term research and analysis.
SQLite
Since the installation and configuration of MISP can be time-consuming, we implemented another module which stores the data in an SQLite database. To present the data to the user as simply and clearly as possible, we also developed a simple web GUI. Using this web application, the scanned domains and URLs can be viewed and searched with all their attributes. Since this is only an early version, no complex comparison features have been implemented yet.
MISP
Building your own Modules
Templates for processing and storage modules are provided as part of the framework.
Processing Modules
Processing modules can be found under crawler->processing
and a sample module file example_processing.py
found in this directory. The template provides the necessary inheritance and imports to ensure execution by the framework. The init function provides for module initialization and receives an instance of the logger and the global configuration. The logger is used to provide logging information from the processing modules, as well as throughout the framework.
The process function is implemented to process each HTTP response. To this end, it receives the URL and the raw response content. This is where the work of the module is implemented. This function should return a dictionary with the following fields:
- hash: the sha256 of the content
- url: the URL the content was retrieved from
- matches: any matching results in the module, For example, libmagic or YARA results.
A unique class name must be defined and is used to define this module when including it via the -p argument or as a default processing module in the configuration file.
Finally, add an import statement in __init__.py
, using your class name:
from .<REPLACE>_processing import <REPLACE>Processing
Storage Modules
Storage modules can be found under crawler->storage
and a sample module file example_storage.py
found in this directory. Similar to the processing modules, init function provides for module initialization and receives an instance of the logger and the global configuration. The store_results function receives structured data from the engine at intervals defined by the batch size in the configuration file.
A unique class name must be defined and is used to load the module when including it via the -s argument or as a default processing module in the configuration file.
Presentations and Other Resources
2021:
- BlackHat Arsenal USA
- VirusBulletin Localhost – Upcoming
License
SubCrawl is licensed under the MIT license
Download Subcrawl
If you like the site, please consider joining the telegram channel or supporting us on Patreon using the button below.