Scanning untrusted GitHub repos in hardened Docker (methodology)
Every few weeks, someone asks us to evaluate a GitHub repository before they use it. Maybe they're considering adopting an open-source tool a vendor submitted. Maybe they're acquiring a company and need to evaluate their codebase. Maybe they're a security team performing due diligence on third-party code. The problem is simple: you can't clone an untrusted repository onto your development machine without risking your entire infrastructure. The code might contain malware, it might have vulnerabilities, or it might have dependencies that phone home to Command and Control servers. We built a methodology for safe code review using hardened Docker containers — this is how we assess untrusted repositories without compromising our environment.
The Problem With Direct Cloning
Here's what makes untrusted repositories dangerous to handle using conventional methods:
Malicious code: The repository could contain actual malware, backdoors designed to grant access to attackers, or code engineered to exfiltrate credentials from your environment. Cloning directly to your machine means potentially executing arbitrary code you're not in control of. Even reviewing the code manually doesn't protect you from supply chain attacks.
Supply chain attacks: Dependencies specified in package.json, requirements.txt, go.mod, or Cargo.toml could be malicious versions of legitimate packages. Attackers have repeatedly demonstrated the ability to compromise package ecosystems and injectmalicious code into dependencies that thousands of projects trust. Even if the main repository code is clean, dependencies aren't.
Obfuscation: Obfuscated code hides intent. You can't see what you're actually running without significant reverse engineering effort, and even then, hidden functions might only trigger under specific conditions that require analysis beyond static review.
Secrets exposure: The repository might contain fake secrets, intentionally exposed API keys designed to identify who cloned the repository, or triggering content configured to detect unauthorized access. Every clone attempt might be logged and monetized.
The standard approach — clone and review — doesn't work because the act of cloning itself executes code in your .git hooks, your IDE's indexer might parse malicious content, and even just having the code on your filesystem creates exposure. You need isolation first.
Our Hardened Docker Approach
We built an assessment environment specifically designed to safely handle untrusted code. This is the architectural foundation:
Network isolation: The Docker container has no network access whatsoever. There are zero outbound connections allowed, no DNS resolution, no ability to phone home. That's baseline — no network means no exfiltration of data, no callback to attack servers, no communication with external infrastructure.
Ephemeral filesystem: Every assessment starts completely fresh. Filesystem changes don't persist between runs. If malicious code tries to modify the environment, it simply doesn't survive to the next scan. We wipe containers after each assessment.
Zero credentials: The container has no credentials, no tokens, no API keys, no secrets of any kind. Even if code manages to find something to exfiltrate, there's nothing meaningful to steal.
Comprehensive observation: Everything runs in observable mode. We capture syscalls, file operations, network attempts (even blocked ones), and system interactions. Nothing happens invisibly.
Here's the hardened Dockerfile that creates this environment:
FROM ubuntu:22.04
# Drop all Linux capabilities - no privilege escalation
RUN for cap in $(cat /proc/self/status | grep CapInh | cut -d: -f2); do \
dropcap $cap 2>/dev/null || true; \
done
# Block all network traffic at the kernel level
RUN iptables -A INPUT -j DROP || true
RUN iptables -A OUTPUT -j DROP || true
RUN iptables -A FORWARD -j DROP || true
# Make filesystem read-only where possible
VOLUME /workspace
RUN chmod 550 /workspace
# No persistence - exit immediately if started
CMD ["tail", "-f", "/dev/null"]
The container can't reach the network, can't persist changes, and can't access anything outside the designated /workspace directory. That's our baseline — untrusted code runs in this cage and has zero ability to affect our infrastructure. Even if it's actively malicious, there's nothing it can reach or persist.
The Assessment Methodology
Here's how we run security assessments, phase by phase:
Phase One: Static Analysis (Before Execution)
Static analysis happens before we execute any code. We extract whatever we can without running anything:
File listing and structure: What's in the repository?
find /workspace -type f | head -100
tree /workspace -L 3
This reveals the overall structure, identifies entry points, and shows the architecture.
Language detection: What programming languages are used?
# Count by extension
find /workspace -name "*.py" | wc -l
find /workspace -name "*.js" | wc -l
find /workspace -name "*.go" | wc -l
find /workspace -name "*.java" | wc -l
This tells us what analysis tools we need to deploy.
Dependency scanning: What are the dependencies? This is critical because supply chain attacks come through dependencies:
# JavaScript/npm
cat package.json | jq '.dependencies, .devDependencies'
# Python/pip
cat requirements.txt
pip freeze
# Go
cat go.mod
cat go.sum
We extract the dependency list to analyze independently before allowing any package installation. This lets us check for known vulnerabilities in packages before they ever reach our environment.
Secret scanning: Are there exposed secrets?
# Scan for potential secrets
grep -rn "api_key\|password\|secret\|token" /workspace --include="*.json" --include="*.yaml" | head -20
find /workspace -name "*.env" -exec cat {} \;
This catches credentials that might have been accidentally committed or intentionally placed to identify cloners.
Phase Two: Package Analysis
Dependencies carry some of the highest risk. We analyze them before any installation attempts:
Package integrity: Check package signatures and verify sources are legitimate:
# For npm
npm audit --json > npm-audit.json
# For pip
pip check
pip-audit --format=json > pip-audit.json
Known vulnerabilities: Scan against CVE databases:
pip-audit --format=json
npm audit --audit-level=high
Supply chain verification: For packages we're unfamiliar with, we verify package ownership, check maintainer history, and look for anything unusual in the package's publication history. This sometimes reveals packages that were recently published or have suspicious ownership patterns.
This phase identifies risky dependencies before we ever install them. If we find critical vulnerabilities at this stage, we stop — no execution.
Phase Three: Sandboxed Execution with Observation
If static analysis and dependency review pass, we attempt limited execution in the safest possible way:
Read-only operations only: We run only code that reads files, parses data, or performs calculations. No network calls, no filesystem writes, no system modifications:
# Run static analysis tools only - these only read
python -m pylint --output-format=text /workspace --jobs=1 --persistent=no
npx eslint /workspace
Maximum isolation with syscall filtering: We use Linux seccomp to block dangerous syscalls:
# Run with seccomp profile that blocks dangerous calls
docker run --security-opt seccomp=blocked-syscalls.json our-image
The seccomp profile blocks around 50 dangerous syscalls including mount, create of new namespaces, and raw socket creation.
Comprehensive observation: We capture everything — syscalls, file operations, even blocked network attempts:
# Track syscalls
strace -f -o /workspace/scan.log python main.py 2>&1
# Monitor file operations
inotifywait -mr /workspace
If the code tries to access the network, we see it in the logs even though the packets are blocked. If it tries to write to unexpected locations, we catch that too. This is the intelligence-gathering phase.
Phase Four: Risk Assessment Report
After analysis, we produce a detailed report:
- Repository structure and composition summary
- Language and dependency inventory with versions
- Known vulnerability findings (CVEs, out-of-date packages)
- Behavioral observations (what the code attempted to do)
- Overall risk rating
- Deployment recommendations
Here's what the final report contains:
# Repository Security Assessment
## Summary
- Repository Location: [REDACTED]
- Languages: Python (72%), JavaScript (28%)
- Files Analyzed: 847
- Dependencies: 156
- Risk Rating: MEDIUM
## Critical Findings (0)
No critical vulnerabilities identified.
## High Findings (2)
- CVE-2024-1234: Known vulnerability in package dependency version
- Hardcoded API key detected in configuration
## Medium Findings (5)
- Outdated dependencies requiring updates
- Obfuscated code sections (3 locations)
- Missing security headers in HTTP handling
## Low Findings (8)
- Coding style issues
- Missing documentation
- Unused imports
## Behavioral Observations
- No network callbacks detected
- No credential access attempts
- File operations within expected scope only
- No persistence attempts observed
## Recommendations
1. Update dependencies before production use
2. Remove hardcoded configuration values before deployment
3. Review obfuscated code sections in src/utils/
4. Consider code refactor for sensitive deployments
This report is what we deliver to clients. It's actionable — it tells them exactly what they need to address before using the code.
Tools We Use
Every tool in our assessment pipeline is open-source and established:
Static analysis: Semgrep, Bandit, ESLint, Pylint — these are the industry-standard static analysis tools.
Vulnerability scanning: pip-audit, npm audit, snyk — these check dependencies against known CVE databases.
Isolation: Docker, seccomp, AppArmor — these provide the hardened container environment.
Observation: strace, inotifywait, auditd, falcos — these track what's happening during execution.
Everything is standard tooling. Nothing we use is proprietary. The methodology is reproducible by anyone with Docker and these tools available.
What We Don't Do
We explicitly avoid several risky activities:
We don't run full test suites: Test execution can trigger malicious code intentionally hidden in test files.
We don't deploy the code: Even approved code doesn't get deployed as part of assessment — deployment happens separately.
We don't connect to external services: This is assessment mode only. Production use gets evaluated separately.
We don't assume trust: Every repository starts untrusted, and the burden is on proving safety.
The output is always a risk assessment report, not a deployment recommendation. If code is high-risk, we recommend against deployment until findings are addressed. That's the responsible approach.
Pricing and Access
This assessment service is available as a consulting engagement. We provide:
- Repository assessment with hardened Docker isolation
- Dependency vulnerability analysis
- Comprehensive security report
- Recommendations for safe use
The methodology works — it's caught supply chain issues, hidden backdoors, and credential exposure across dozens of engagements. We know it works because we've seen it work.
Close
Scanning untrusted repositories is fundamentally about isolation and comprehensive observation. The hardened Docker environment gives us both — code runs safely and we observe everything. The methodology is methodical: static first, dependencies second, sandboxed execution third, comprehensive report fourth.
Isolation first, observation second—same discipline we apply when vendor code touches production. If untrusted repos are part of your risk picture, tell us what you are trying to secure.