how-to

How to Set Up Semgrep in 2026 - Complete Installation and Configuration Guide

Set up Semgrep for security scanning. Covers CLI install, custom rules, GitHub Actions integration, Semgrep Cloud, PR comments, and troubleshooting.

Published:

Why Semgrep and why now

Semgrep is a fast, open-source static analysis tool that finds bugs and security vulnerabilities by letting you write rules that look like the code you are searching for. Unlike legacy SAST tools that require specialized security expertise to configure and produce overwhelming false positive rates, Semgrep was designed from the ground up for developers. Its pattern syntax mirrors your actual source code, its CLI runs in seconds, and its rule library covers over 30 programming languages with thousands of pre-written checks.

Since its launch by r2c (now Semgrep, Inc.) in 2020, Semgrep has become the default security scanner for thousands of engineering teams - from startups running the free open-source engine to enterprises using the full cloud platform. Dropbox, Figma, Snowflake, and Hashicorp all use Semgrep in their development pipelines. The tool scans over 100 million lines of code daily across its user base.

The reason to set up Semgrep now is straightforward. Every major compliance framework - SOC 2, PCI DSS 4.0, ISO 27001 - requires or strongly recommends static analysis in the development lifecycle. Semgrep gives you that compliance evidence while also catching real vulnerabilities. The open-source engine is free, and the full platform is free for teams of up to 10 contributors. There is no licensing cost barrier to getting started.

This guide walks through every step of setting up Semgrep - from installing the CLI on your local machine to running it in CI/CD pipelines with automatic PR comments. By the end, you will have a production-ready Semgrep configuration that catches security issues before they reach your main branch.

Semgrep security scanning tool homepage screenshot
Semgrep homepage

Step 1 - Install the Semgrep CLI

Semgrep provides three installation methods. Choose the one that fits your environment.

Install with pip (all platforms)

The pip installation is the recommended method and works on macOS, Linux, and Windows (via WSL). Semgrep requires Python 3.8 or later.

# Install Semgrep
pip install semgrep

# Verify the installation
semgrep --version

If you prefer to isolate Semgrep from your system Python, use pipx:

# Install with pipx for isolated environment
pipx install semgrep

# Verify
semgrep --version

Install with Homebrew (macOS)

On macOS, Homebrew provides a straightforward installation:

# Install via Homebrew
brew install semgrep

# Verify
semgrep --version

Install with Docker (any platform)

Docker is useful for CI environments or when you do not want to install anything on the host system:

# Pull the Semgrep image
docker pull semgrep/semgrep

# Run a scan using the Docker image
docker run --rm -v "${PWD}:/src" semgrep/semgrep semgrep --config auto /src

The Docker approach mounts your current directory into the container at /src and runs the scan against it. This method ensures a consistent environment regardless of the host operating system.

Verify your installation

After installing through any method, confirm that Semgrep is working:

semgrep --version
# Expected output: semgrep 1.x.x

If you see a version number, the installation was successful. If you encounter a “command not found” error, ensure that the installation directory is in your system PATH. For pip installations, this is typically ~/.local/bin on Linux or ~/Library/Python/3.x/bin on macOS.

Step 2 - Run your first scan

With Semgrep installed, you can run your first scan immediately without any configuration files.

Scan with the default rule set

Navigate to any project directory and run:

cd /path/to/your/project
semgrep --config auto

The --config auto flag tells Semgrep to automatically select rules that are relevant to the languages and frameworks detected in your project. Semgrep downloads the appropriate rules from the Semgrep Registry, runs the scan, and prints findings to your terminal.

Scan with a specific rule set

For more control over which rules run, specify a rule set by name:

# Run the default curated rule set
semgrep --config p/default

# Run security-focused rules
semgrep --config p/security-audit

# Run language-specific rules
semgrep --config p/python
semgrep --config p/javascript
semgrep --config p/golang

Scan a specific file or directory

You do not have to scan your entire project. Target specific paths:

# Scan a single file
semgrep --config p/default src/auth/login.py

# Scan a specific directory
semgrep --config p/default src/api/

# Scan multiple paths
semgrep --config p/default src/auth/ src/api/ src/middleware/

Understand the output

A typical Semgrep finding looks like this:

src/api/users.py
  security.python.sql-injection.sql-injection
    Detected string concatenation in SQL query. Use parameterized queries instead.

    14│ query = "SELECT * FROM users WHERE id = " + user_id

Each finding includes the file path, the rule ID that triggered the match, a human-readable message explaining the issue, and the exact line of code that was matched. The rule ID is important because you will use it later to customize which rules run and to suppress false positives.

Output in different formats

Semgrep supports multiple output formats for integration with other tools:

# JSON output for programmatic processing
semgrep --config p/default --json > results.json

# SARIF output for GitHub Code Scanning integration
semgrep --config p/default --sarif > results.sarif

# JUnit XML for CI/CD integration
semgrep --config p/default --junit-xml > results.xml

# Emacs/Vim compatible output
semgrep --config p/default --emacs

Step 3 - Understand Semgrep rule sets

Rule sets are collections of rules curated for specific use cases. Choosing the right rule sets determines what Semgrep looks for and how many findings you get.

Core rule sets

p/default is the starting point for most teams. It contains high-confidence security and correctness rules curated by the Semgrep team. These rules have low false positive rates and focus on issues that are almost always worth fixing. Start here and expand later.

p/security-audit is a broader security-focused set that includes rules with moderate confidence. It catches more potential issues but produces more findings that may require manual review. Use this when you want comprehensive security coverage and have the bandwidth to triage additional findings.

p/owasp-top-ten maps rules to the OWASP Top 10 vulnerability categories - injection, broken authentication, sensitive data exposure, and so on. This set is useful for compliance-driven teams that need to demonstrate OWASP coverage in their security program.

Language-specific rule sets

Semgrep provides curated rule sets for individual languages and frameworks:

Rule setFocus
p/pythonPython security and correctness
p/javascriptJavaScript and Node.js security
p/typescriptTypeScript-specific patterns
p/golangGo security and error handling
p/javaJava security patterns
p/rubyRuby and Rails security
p/csharpC# security patterns
p/phpPHP security patterns
p/rustRust safety and correctness

Infrastructure rule sets

For infrastructure-as-code scanning:

Rule setFocus
p/terraformTerraform misconfigurations
p/dockerfileDockerfile security
p/docker-composeDocker Compose issues
p/kubernetesKubernetes YAML security

Combining multiple rule sets

You can run multiple rule sets in a single scan by passing multiple --config flags:

semgrep --config p/default --config p/security-audit --config p/python

Start with p/default alone, review the findings, and then add additional sets incrementally. Adding too many rule sets at once can generate an overwhelming number of findings that make it hard to prioritize what to fix first.

Step 4 - Write custom Semgrep rules

One of Semgrep’s most powerful features is how easy it is to write custom rules. Unlike tools that require a proprietary query language, Semgrep rules use patterns that look like the code they are matching.

Basic rule structure

Create a file called custom-rules.yaml:

rules:
  - id: no-hardcoded-passwords
    patterns:
      - pattern: password = "$VALUE"
    message: >
      Hardcoded password detected. Use environment variables or a
      secrets manager instead of embedding credentials in source code.
    severity: ERROR
    languages:
      - python
    metadata:
      cwe:
        - "CWE-798: Use of Hard-coded Credentials"
      category: security

Every rule needs five required fields:

  • id - a unique identifier for the rule, used in output and for suppression
  • pattern or patterns - the code pattern to match, using metavariables like $VALUE as placeholders
  • message - a human-readable explanation shown when the rule matches
  • severity - one of ERROR, WARNING, or INFO
  • languages - an array of languages this rule applies to

Using metavariables

Metavariables are placeholders that match any expression. They start with $ and an uppercase name:

rules:
  - id: insecure-hash-algorithm
    patterns:
      - pattern: hashlib.$ALGO(...)
      - metavariable-regex:
          metavariable: $ALGO
          regex: (md5|sha1)
    message: >
      Insecure hash algorithm '$ALGO' detected. Use SHA-256 or
      stronger for cryptographic hashing.
    severity: WARNING
    languages:
      - python

This rule matches any call to hashlib.md5(...) or hashlib.sha1(...) regardless of the arguments passed.

Combining patterns with pattern-either and pattern-not

Use pattern-either to match multiple patterns and pattern-not to exclude safe patterns:

rules:
  - id: dangerous-eval
    patterns:
      - pattern-either:
          - pattern: eval($INPUT)
          - pattern: exec($INPUT)
      - pattern-not: eval("literal_string")
    message: >
      Use of eval() or exec() with dynamic input is a code injection
      risk. Consider using ast.literal_eval() or a safer alternative.
    severity: ERROR
    languages:
      - python

A more advanced example - detecting SQL injection

rules:
  - id: flask-sql-injection
    patterns:
      - pattern: |
          $CURSOR.execute("..." + $INPUT + "...")
      - pattern-not: |
          $CURSOR.execute("..." + "..." + "...")
    message: >
      SQL query built using string concatenation with variable input.
      Use parameterized queries with placeholders to prevent SQL injection.
      Replace: cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
    severity: ERROR
    languages:
      - python
    metadata:
      cwe:
        - "CWE-89: SQL Injection"
      owasp:
        - "A03:2021 - Injection"

Testing your custom rules

Run your custom rules against your codebase:

# Run a single custom rule file
semgrep --config custom-rules.yaml

# Run custom rules alongside registry rules
semgrep --config custom-rules.yaml --config p/default

# Test a rule against a specific file
semgrep --config custom-rules.yaml src/database/queries.py

Organizing custom rules

For teams with multiple custom rules, organize them in a directory:

.semgrep/
  security/
    sql-injection.yaml
    auth-bypass.yaml
    secrets.yaml
  correctness/
    null-checks.yaml
    error-handling.yaml
  style/
    naming-conventions.yaml

Then scan with the entire directory:

semgrep --config .semgrep/

Step 5 - Set up Semgrep in GitHub Actions

Running Semgrep in CI ensures every pull request is scanned automatically. Here is how to set up a production-ready GitHub Actions workflow.

Basic GitHub Actions workflow

Create .github/workflows/semgrep.yml:

name: Semgrep

on:
  pull_request: {}
  push:
    branches:
      - main

jobs:
  semgrep:
    name: Semgrep Scan
    runs-on: ubuntu-latest
    container:
      image: semgrep/semgrep
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Run Semgrep
        run: semgrep ci
        env:
          SEMGREP_APP_TOKEN: ${{ secrets.SEMGREP_APP_TOKEN }}

This workflow runs Semgrep on every pull request and every push to the main branch. The semgrep ci command is designed for CI environments - it performs diff-aware scanning (only analyzing changed files on PRs), uploads results to Semgrep Cloud if a token is configured, and exits with a non-zero code if blocking findings are detected.

GitHub Actions without Semgrep Cloud

If you do not want to use Semgrep Cloud, you can run the scan using only the CLI with specific rule sets:

name: Semgrep

on:
  pull_request: {}
  push:
    branches:
      - main

jobs:
  semgrep:
    name: Semgrep Scan
    runs-on: ubuntu-latest
    container:
      image: semgrep/semgrep
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Run Semgrep
        run: semgrep scan --config p/default --config p/security-audit --error

The --error flag causes Semgrep to exit with a non-zero code when findings are detected, which fails the GitHub Actions check and blocks the PR from merging if branch protection is configured.

Adding SARIF upload for GitHub Code Scanning

To see Semgrep findings in GitHub’s Security tab alongside CodeQL results:

name: Semgrep

on:
  pull_request: {}
  push:
    branches:
      - main

jobs:
  semgrep:
    name: Semgrep Scan
    runs-on: ubuntu-latest
    container:
      image: semgrep/semgrep
    permissions:
      security-events: write
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Run Semgrep
        run: semgrep scan --config p/default --sarif --output semgrep-results.sarif

      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: semgrep-results.sarif
        if: always()

This uploads Semgrep results in SARIF format so they appear in the GitHub Security tab under “Code scanning alerts.” The if: always() ensures results are uploaded even when Semgrep finds issues and returns a non-zero exit code.

Configuring branch protection

After adding the Semgrep workflow, configure GitHub branch protection to require the scan to pass before merging:

  1. Go to your repository Settings, then Branches
  2. Edit the branch protection rule for your main branch
  3. Enable “Require status checks to pass before merging”
  4. Search for and add the “Semgrep Scan” check
  5. Save your changes

Now any PR with blocking Semgrep findings will be prevented from merging until the issues are resolved.

Step 6 - Connect Semgrep Cloud

Semgrep Cloud (the Semgrep AppSec Platform) adds a web dashboard, PR comments, AI-powered triage, and cross-file analysis on top of the open-source CLI. It is free for up to 10 contributors.

Create a Semgrep Cloud account

  1. Go to semgrep.dev and sign up with your GitHub, GitLab, or email account
  2. Create an organization that matches your GitHub organization
  3. Navigate to Settings and generate an API token
  4. Save this token - you will need it for CI configuration

Add the token to GitHub

  1. Go to your repository Settings, then Secrets and variables, then Actions
  2. Click “New repository secret”
  3. Name it SEMGREP_APP_TOKEN
  4. Paste the token from Semgrep Cloud
  5. Click “Add secret”

Configure scanning policies in Semgrep Cloud

Semgrep Cloud lets you manage rule configuration from the web dashboard instead of hardcoding rule sets in your CI file:

  1. In Semgrep Cloud, go to Policies
  2. You will see a default policy with recommended rules enabled
  3. Add or remove rule sets based on your needs
  4. Set rules to Comment, Block, or Monitor mode

When you use semgrep ci in your workflow (instead of semgrep scan --config ...), Semgrep pulls its configuration from the cloud policy. This means you can change what gets scanned and what severity levels block PRs without modifying your workflow file.

Rule modes in Semgrep Cloud

Semgrep Cloud supports three modes for each rule:

  • Block - findings from this rule fail the CI check and block the PR from merging
  • Comment - findings are posted as PR comments but do not block merging
  • Monitor - findings are tracked in the dashboard but are not surfaced on the PR at all

Most teams start with the majority of rules in Comment mode and only promote rules to Block mode after confirming they produce zero or near-zero false positives in their specific codebase.

Step 7 - Configure PR comments

PR comments are how Semgrep delivers findings to developers in their existing workflow - directly in the pull request where the code was changed.

How PR comments work

When Semgrep Cloud is connected and semgrep ci runs in your CI pipeline, it performs a diff-aware scan that analyzes only the code changed in the pull request. New findings are posted as inline comments on the exact lines of code where the issues were detected. Each comment includes the rule name, severity, a description of the issue, and often a suggested fix.

Install the Semgrep GitHub App

For PR comments to work, you need the Semgrep GitHub App installed:

  1. In Semgrep Cloud, go to Settings, then Source Code Managers
  2. Click “Add GitHub” and authorize the Semgrep GitHub App
  3. Choose which repositories or organizations to grant access to
  4. Confirm the installation

Once installed, any repository connected to Semgrep Cloud will receive inline PR comments when semgrep ci detects new findings.

Customizing comment behavior

In Semgrep Cloud under Settings, you can control:

  • Whether comments include fix suggestions
  • Whether to add a summary comment at the top of the PR
  • Whether to leave comments only for blocking findings or for all findings
  • Whether to automatically resolve comments when the underlying code is fixed

Example PR comment

A typical Semgrep PR comment looks like:

⚠️ semgrep: python.flask.security.injection.tainted-sql-string

Detected user input flowing into a SQL query without sanitization.
This is a SQL injection vulnerability.

Suggested fix: Use parameterized queries.
- cursor.execute("SELECT * FROM users WHERE id = " + request.args["id"])
+ cursor.execute("SELECT * FROM users WHERE id = %s", (request.args["id"],))

🔗 Rule details | 📘 CWE-89 | Triage in Semgrep Cloud

Developers can respond to findings directly in the PR - fixing the code, marking as false positive, or adding a # nosemgrep comment to suppress the finding with justification.

Step 8 - Set up .semgrepignore

The .semgrepignore file tells Semgrep which files and directories to skip during scans. This is essential for reducing noise from test files, generated code, vendored dependencies, and other paths where findings are not actionable.

Create a .semgrepignore file

Create a .semgrepignore file in your repository root:

# Test files - findings in tests are usually not exploitable
tests/
test/
*_test.go
*_test.py
test_*.py
*.test.js
*.test.ts
*.spec.js
*.spec.ts

# Generated code - cannot be fixed in source
generated/
__generated__/
*.generated.ts
*.gen.go

# Vendored dependencies - managed upstream
vendor/
node_modules/
third_party/

# Build artifacts
dist/
build/
out/
.next/

# Documentation and configuration
docs/
*.md
*.rst

# Migrations - often contain raw SQL by design
migrations/
alembic/

.semgrepignore syntax

The syntax follows .gitignore conventions:

  • directory/ ignores an entire directory and its contents
  • *.ext ignores all files with that extension
  • pattern ignores matching files anywhere in the tree
  • !pattern negates a previous ignore (force-includes a file)
  • # starts a comment line

When not to ignore

Be cautious about ignoring too much. Some common patterns to avoid ignoring:

  • Do not ignore configuration files. Security misconfigurations in Dockerfiles, Terraform files, and Kubernetes manifests are high-value findings.
  • Do not ignore migration files entirely. While raw SQL in migrations is expected, injection vulnerabilities can still appear when migrations accept dynamic input.
  • Do not ignore scripts/ or tools/ directories. Internal tooling often has weaker security standards and is a common source of vulnerabilities.

Step 9 - Advanced configuration

Beyond the basics, Semgrep supports several configuration options that help you tune scanning behavior for your codebase.

The semgrep.yaml configuration file

Create a .semgrep.yaml file in your repository root to set default scan options:

rules:
  - p/default
  - p/security-audit
  - .semgrep/

options:
  timeout: 30
  max-memory: 5000

Excluding specific rules

If a specific rule from a registry set produces too many false positives in your codebase, exclude it:

# Exclude specific rules by ID
semgrep --config p/default --exclude-rule "generic.secrets.gitleaks.generic-api-key"

Inline suppressions

Suppress individual findings with inline comments:

# nosemgrep: python.flask.security.injection.tainted-sql-string
cursor.execute("SELECT * FROM config WHERE key = " + safe_internal_key)

The nosemgrep comment accepts a rule ID to suppress only that specific rule on the annotated line. This is preferable to broad suppressions because it documents exactly which check was reviewed and accepted.

For other languages, the comment syntax follows the language convention:

// nosemgrep: javascript.express.security.injection.tainted-sql-string
db.query("SELECT * FROM config WHERE key = " + internalKey);
// nosemgrep: go.lang.security.injection.tainted-sql-string
db.Query("SELECT * FROM config WHERE key = " + safeKey)

Setting exit codes for CI

Control how Semgrep’s exit code maps to CI pass/fail behavior:

# Exit 1 only for ERROR severity findings
semgrep --config p/default --severity ERROR --error

# Exit 1 for WARNING and ERROR findings
semgrep --config p/default --severity WARNING --error

This lets you treat ERROR-level findings as blocking while allowing WARNING-level findings to be advisory.

Max target size and timeout

For large repositories, configure limits to prevent scans from hanging:

# Skip files larger than 1MB
semgrep --config p/default --max-target-bytes 1000000

# Set per-rule timeout to 30 seconds
semgrep --config p/default --timeout 30

# Set maximum memory usage
semgrep --config p/default --max-memory 5000

Step 10 - Troubleshooting common issues

Here are the most common problems teams encounter when setting up Semgrep and how to resolve them.

”command not found” after installation

If semgrep is not found after installing via pip, your Python scripts directory is not in your PATH:

# Find where pip installed Semgrep
python -m site --user-base
# Add the bin directory to your PATH
export PATH="$HOME/.local/bin:$PATH"

# Make it permanent by adding to your shell profile
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc

On macOS with Homebrew, this issue is rare. If it occurs, run brew link semgrep.

Slow scans on large repositories

If scans are taking too long, there are several optimizations:

# Use diff-aware scanning to only check changed files
semgrep ci  # Automatically diff-aware in CI

# Exclude large directories
semgrep --config p/default --exclude "vendor" --exclude "node_modules"

# Limit concurrency to reduce memory usage
semgrep --config p/default --jobs 2

# Skip large files
semgrep --config p/default --max-target-bytes 500000

Out of memory errors

Semgrep can consume significant memory on very large files or complex rule sets:

# Limit maximum memory usage (in MB)
semgrep --config p/default --max-memory 4000

# Reduce the number of rules being run
semgrep --config p/default  # Instead of p/security-audit which has more rules

# Exclude known large files
# Add to .semgrepignore:
# *.min.js
# *.bundle.js
# package-lock.json

Rules not matching expected code

When a custom rule does not match code that you expect it to match:

# Test with verbose output to see what Semgrep is doing
semgrep --config your-rule.yaml --verbose target-file.py

# Use --debug for detailed matching information
semgrep --config your-rule.yaml --debug target-file.py

# Validate your rule syntax
semgrep --validate --config your-rule.yaml

Common reasons rules fail to match:

  • Language mismatch - the languages field in the rule does not include the file’s language
  • Whitespace sensitivity - Semgrep normalizes most whitespace, but multiline patterns need to use ... for gaps
  • Metavariable scope - a metavariable used across patterns must match the same expression in all occurrences

GitHub Actions workflow not triggering

If the Semgrep workflow does not run on pull requests:

  • Verify the workflow file is in .github/workflows/ on the default branch
  • Check that the on: trigger includes pull_request
  • Ensure the YAML syntax is valid - GitHub silently ignores malformed workflow files
  • Check the Actions tab for workflow run errors

SEMGREP_APP_TOKEN errors

If semgrep ci fails with authentication errors:

  • Verify the secret is named exactly SEMGREP_APP_TOKEN in GitHub Settings
  • Ensure the token has not expired in Semgrep Cloud
  • Check that the repository is connected to the correct organization in Semgrep Cloud
  • Regenerate the token in Semgrep Cloud and update the GitHub secret

Findings in generated or vendored code

If Semgrep is flagging code you cannot change:

  1. Add the paths to .semgrepignore
  2. Use inline # nosemgrep comments for individual suppressions
  3. In Semgrep Cloud, triage findings as “Ignored” with a reason of “Generated code” or “Vendored dependency”

The right Semgrep configuration depends on your team’s size, security maturity, and available bandwidth for triaging findings.

Solo developers and small teams (1-5 developers)

Use Semgrep OSS with the default rule set. Install Semgrep locally, run semgrep --config p/default before pushing code, and add a basic GitHub Actions workflow with p/default. This catches the highest-confidence security issues with minimal noise. Total setup time is about 15 minutes.

# Local development workflow
semgrep --config p/default --error

Mid-size teams (5-20 developers)

Use Semgrep Cloud on the free Team tier. Connect your repositories to Semgrep Cloud, enable PR comments, and start with rules in Comment mode. This gives your team automated feedback on every PR without blocking velocity while you learn which rules are most relevant to your codebase. Promote high-confidence rules to Block mode after a few weeks of triage data. Total setup time is about 30 minutes.

Large teams and enterprises (20+ developers)

Use Semgrep Cloud with a dedicated policy per repository or team. Enable Semgrep Assistant for AI-powered triage to reduce false positive noise. Create custom rules for your internal frameworks and patterns. Integrate SARIF output with GitHub Code Scanning for centralized security visibility. Set up separate policies for different service tiers - stricter rules for payment processing code, lighter rules for internal tools. Total setup time is about 2 hours for the initial configuration plus ongoing refinement.

What to do after setup

Once Semgrep is running in your CI pipeline, the work shifts from configuration to maintenance. Here is a practical sequence for the first 30 days:

Week 1 - Run Semgrep with p/default in Comment mode. Review findings as they come in on PRs. Note any rules that produce frequent false positives in your codebase.

Week 2 - Add paths to .semgrepignore for test files, generated code, and vendored dependencies that are generating noise. Suppress any rule IDs that are consistently false positives for your project.

Week 3 - Promote high-confidence rules to Block mode. Start with rules for critical vulnerabilities like SQL injection, command injection, and hardcoded credentials. Add p/security-audit in Monitor mode to evaluate its findings without surfacing them on PRs.

Week 4 - Write your first custom rule targeting a pattern specific to your codebase - an internal API misuse, a deprecated function that should not be called, or an authentication pattern that must be followed. Share the rule with your team and collect feedback.

This gradual rollout ensures that Semgrep becomes a trusted part of your workflow rather than a noisy tool that developers learn to ignore. The goal is not to enable every rule on day one - it is to build a scanning configuration that your team actually reads and acts on.

Frequently Asked Questions

How do I install Semgrep?

Install Semgrep using pip with 'pip install semgrep', using Homebrew on macOS with 'brew install semgrep', or using Docker with 'docker run semgrep/semgrep'. The pip method works on all operating systems and is the recommended approach. After installation, verify it works by running 'semgrep --version' in your terminal.

Is Semgrep free to use?

Yes, the Semgrep open-source engine is free under the LGPL-2.1 license and includes 2,800+ community rules. The full Semgrep platform - which adds cross-file analysis, SCA, secrets detection, and AI-powered triage - is free for teams of up to 10 contributors. Beyond 10 contributors, the Team plan costs $35 per contributor per month.

What is the difference between Semgrep OSS and Semgrep Cloud?

Semgrep OSS is the open-source CLI engine that performs single-file pattern matching with 2,800+ community rules. Semgrep Cloud (also called Semgrep AppSec Platform) adds cross-file dataflow analysis, 20,000+ Pro rules, AI-powered triage with Semgrep Assistant, a web dashboard for managing findings, and integrations for PR comments. Semgrep Cloud is free for up to 10 contributors.

How do I set up Semgrep in GitHub Actions?

Create a workflow file at .github/workflows/semgrep.yml that triggers on pull_request and push events. Use the 'semgrep/semgrep-action@v1' action or run the Semgrep CLI directly. For Semgrep Cloud integration, add your SEMGREP_APP_TOKEN as a GitHub secret and use 'semgrep ci' as the scan command to get PR comments and dashboard reporting.

How do I write custom Semgrep rules?

Create a YAML file with a rules array. Each rule needs an id, a pattern or patterns block, a message, a severity level (ERROR, WARNING, or INFO), and a languages array. Semgrep's pattern syntax mirrors the target language - you write patterns that look like the code you want to match, using metavariables like $X as placeholders. Test rules with 'semgrep --config your-rule.yaml' against your codebase.

What languages does Semgrep support?

Semgrep supports over 30 programming languages including Python, JavaScript, TypeScript, Java, Go, Ruby, C, C++, C#, Rust, Kotlin, Swift, PHP, Scala, Terraform, Dockerfile, and Kubernetes YAML. The open-source engine supports all languages. The Pro engine adds deeper cross-file analysis for a subset of these languages.

How do I reduce false positives in Semgrep?

Use several strategies: create a .semgrepignore file to exclude test files, generated code, and vendor directories. Use Semgrep Assistant (available in the Cloud platform) for AI-powered triage that automatically filters noise. Write custom rules with pattern-not clauses to exclude known-safe patterns. Start with high-confidence rule sets like p/default rather than broad sets like p/security-audit.

What are the best Semgrep rule sets to start with?

Start with p/default, which includes high-confidence security and correctness rules curated by the Semgrep team. Add p/security-audit for broader security coverage when you are ready for more findings. For specific languages, use targeted sets like p/python, p/javascript, or p/golang. For infrastructure scanning, add p/terraform or p/docker-compose. You can combine multiple rule sets in a single scan.

How do I configure Semgrep to post comments on pull requests?

Connect your repository to Semgrep Cloud at semgrep.dev, install the Semgrep GitHub App, and add your SEMGREP_APP_TOKEN to your CI environment. When Semgrep runs with 'semgrep ci' in your pipeline, it automatically posts inline comments on pull requests for any new findings. Comments include the rule ID, severity, explanation, and remediation guidance.

Can Semgrep scan Docker containers and infrastructure-as-code?

Yes. Semgrep has dedicated rule sets for Dockerfile security (p/dockerfile), Terraform misconfigurations (p/terraform), Kubernetes YAML (p/kubernetes), and Docker Compose files (p/docker-compose). These rules detect issues like running containers as root, exposing unnecessary ports, missing resource limits, and insecure cloud resource configurations.

How fast is Semgrep compared to other SAST tools?

Semgrep is one of the fastest SAST tools available. The median CI scan time is approximately 10 seconds because Semgrep uses diff-aware scanning to analyze only changed files on pull requests. Full repository scans typically complete in under 60 seconds for most codebases. This is significantly faster than tools like SonarQube, Checkmarx, or Veracode, which often take minutes to hours for comparable analysis.

How do I ignore specific Semgrep findings?

Ignore findings at three levels. In code, add a '# nosemgrep: rule-id' comment on the line before the finding. In configuration, add paths or patterns to a .semgrepignore file. In Semgrep Cloud, triage findings as false positives or accepted risks in the dashboard - these triage decisions persist across future scans so the same finding is not reported again.

Does Semgrep work with GitLab CI and other CI platforms?

Yes. Semgrep works with any CI platform that can run a shell command. For GitLab CI, add a Semgrep job to your .gitlab-ci.yml file. Semgrep also has documented integrations for Jenkins, CircleCI, Buildkite, and Azure Pipelines. The 'semgrep ci' command handles authentication, diff-aware scanning, and result reporting regardless of which CI platform you use.

Explore More

Tool Reviews

Free Newsletter

Stay ahead with AI dev tools

Weekly insights on AI code review, static analysis, and developer productivity. No spam, unsubscribe anytime.

Join developers getting weekly AI tool insights.

Related Articles