Packj – Large-Scale Security Analysis Platform To Detect Malicious/Risky Open-Source Packages
- PyConUS’22 talk and slides.
- BlackHAT Asia’22 Arsenal presentation
- PackagingCon’21 talk and slides
- Academic dissertation on open-source software security and the paper from our group at Georgia Tech that started this research.
Upcoming talks
- BlackHat USA’22 Arsenal talk Detecting typo-squatting, backdoored, abandoned, and other “risky” open-source packages using Packj
- Open Source Summit, Europe’22 talk Scoring dependencies to detect “weak links” in your open-source software supply chain
Feature roadmap
- Add support for other language ecosystems. Rust is a work in progress, and will be available in July ’22 (last week).
- Add functionality to detect several other “risky” code as well as metadata attributes.
- Packj currently only performs static code analysis, we are working on adding support for dynamic analysis (WIP, ETA: end of summer)
Team
Packj has been developed by Cybersecurity researchers at Ossillate Inc. and external collaborators to help developers mitigate risks of supply chain attacks when sourcing untrusted third-party open-source software dependencies. We thank our developers and collaborators.
We welcome code contributions. Join our discord community for discussion and feature requests.
FAQ
- What Package Managers (Registries) are supported?
Packj can currently vet NPM, PyPI, and RubyGems packages for “risky” attributes. We are adding support for Rust.
- Does it work on obfuscated calls? For example, a base 64 encrypted string that gets decrypted and then passed to a shell?
This is a very common malicious behavior. Packj detects code obfuscation as well as spawning of shell commands (exec system call). For example, Packj can flag use of getattr()
and eval()
API as they indicate “runtime code generation”; a developer can go and take a deeper look then. See main.py for details.
- Does this work at the system call level, where it would detect e.g. any attempt to open ~/.aws/credentials, or does it rely on heuristic analysis of the code itself, which will always be able to be “coded around” by the malware authors?
Packj currently uses static code analysis to derive permissions (e.g., file/network accesses). Therefore, it can detect open() calls if used by the malware directly (e.g., not obfuscated in a base64 encoded string). But, Packj can also point out such base64 decode calls. Fortunately, malware has to use these APIs (read, open, decode, eval, etc.) for their functionality — there’s no getting around. Having said that, a sophisticated malware can hide itself better, so dynamic analysis must be performed for completeness. We are incorporating strace-based dynamic analysis (containerized) to collect system calls. See roadmap for details.
If you like the site, please consider joining the telegram channel or supporting us on Patreon using the button below.