how-to

How to Write Custom Semgrep Rules: Complete Tutorial

Learn to write custom Semgrep rules with YAML syntax, pattern matching, metavariables, taint tracking, and real-world examples for SQL injection and XSS.

Published: March 13, 2026

Why write custom Semgrep rules

Semgrep ships with over 2,800 community rules and 20,000+ Pro rules that cover common security vulnerabilities, best practice violations, and correctness issues across more than 30 programming languages. For many teams, these pre-built rule sets are enough to catch the most critical problems. But every codebase has patterns, APIs, and conventions that are unique to its organization - and that is where custom rules become essential.

Custom Semgrep rules let you codify institutional knowledge into automated checks. When a senior engineer discovers a subtle misuse of an internal API, they can write a rule that catches that mistake everywhere it appears and prevents it from being introduced again. When your security team identifies a vulnerability pattern specific to your framework, they can encode it as a rule that runs on every pull request. The result is a living, growing library of checks tailored to your exact codebase.

This tutorial covers everything you need to know about writing custom Semgrep rules - from basic YAML syntax and pattern matching to advanced features like taint tracking, metavariable constraints, and autofix suggestions. By the end, you will be able to write, test, and publish rules that catch real vulnerabilities including SQL injection, cross-site scripting, and hardcoded secrets. If you have not installed Semgrep yet, start with our guide on how to setup Semgrep before continuing here.

Semgrep code review tool homepage screenshot — Semgrep homepage

Rule syntax basics - the YAML structure

Every Semgrep rule is a YAML file containing a rules array. Each rule in the array requires five fields: id, pattern (or patterns), message, severity, and languages. Here is the simplest possible rule:

rules:
  - id: no-eval
    pattern: eval(...)
    message: >
      Avoid using eval() as it can execute arbitrary code.
      Use a safer alternative like ast.literal_eval() for Python
      or JSON.parse() for JavaScript.
    severity: WARNING
    languages:
      - python

The id is a unique identifier for the rule. Use a descriptive, kebab-case name that makes it clear what the rule detects. The pattern field contains code written in the target language’s syntax - Semgrep matches this pattern against your source code. The message is what developers see when the rule fires. The severity must be one of ERROR, WARNING, or INFO. The languages array lists which languages to scan.

Save this as no-eval.yaml and run it:

semgrep --config no-eval.yaml .

Semgrep will scan every Python file in the current directory and flag any call to eval(), regardless of what arguments are passed. The ... in the pattern is the ellipsis operator, which matches zero or more arguments.

Additional rule metadata

Beyond the five required fields, you can add metadata that improves how findings appear in dashboards and integrations:

rules:
  - id: no-eval
    pattern: eval(...)
    message: Avoid using eval(). Use ast.literal_eval() instead.
    severity: WARNING
    languages:
      - python
    metadata:
      category: security
      technology:
        - python
      cwe:
        - "CWE-95: Improper Neutralization of Directives in
          Dynamically Evaluated Code ('Eval Injection')"
      confidence: HIGH
      references:
        - https://owasp.org/Top10/A03_2021-Injection/

The metadata block is freeform - you can include any keys you want. The cwe, confidence, and category fields are recognized by the Semgrep platform and are used for filtering and reporting.

Pattern matching fundamentals

Semgrep patterns are written in the syntax of the language you are scanning. This is what makes Semgrep accessible - you do not need to learn a specialized query language or understand abstract syntax trees. If you can read the code, you can write the pattern.

Matching function calls

To match a specific function call:

pattern: os.system(...)

This matches os.system("ls"), os.system(user_input), os.system(f"rm {path}"), and any other invocation of os.system regardless of what is passed as an argument.

Matching assignments

To match a variable being assigned a specific value pattern:

pattern: $VAR = os.environ.get(...)

This matches any variable being assigned the return value of os.environ.get().

Matching string literals

The "..." syntax matches any string literal:

pattern: password = "..."

This matches password = "admin123", password = "hunter2", or any other string assigned to a variable named password.

The ellipsis operator in depth

The ellipsis operator (...) is one of the most powerful features in Semgrep’s pattern syntax. It matches different things depending on context:

In function arguments: func(...) matches any arguments
In method chains: $OBJ. ... .execute(...) matches any chain of method calls ending in .execute()
In code blocks: if ...: ... matches any if statement with any condition and any body
Between statements: Placing ... on its own line in a pattern matches any number of intermediate statements

# Match any function that calls both connect() and execute()
# regardless of what happens between them
pattern: |
  def $FUNC(...):
      ...
      $DB.connect(...)
      ...
      $DB.execute(...)
      ...

Metavariables - capturing and reusing matched code

Metavariables are placeholders that begin with $ and match any single expression, statement, or identifier. They are the key to writing flexible, reusable patterns.

Basic metavariable usage

rules:
  - id: dangerous-subprocess
    pattern: subprocess.call($CMD, shell=True)
    message: >
      subprocess.call() with shell=True and command '$CMD' is
      dangerous. Use subprocess.run() with a list of arguments
      instead.
    severity: ERROR
    languages:
      - python

When this rule matches subprocess.call(user_input, shell=True), the metavariable $CMD binds to user_input, and the message displays that specific value. This gives developers precise context about what triggered the finding.

Named metavariables for clarity

Use descriptive names to make rules self-documenting:

pattern: $LOGGER.info($MESSAGE, password=$PASSWORD)

Here $LOGGER, $MESSAGE, and $PASSWORD all serve as readable documentation of what the pattern expects at each position.

Metavariable consistency

When the same metavariable name appears multiple times in a pattern, Semgrep requires all occurrences to match the same value:

pattern: $X == $X

This matches tautological comparisons like a == a or user.name == user.name but does not match a == b.

Constraining metavariables with regex

Use metavariable-regex to restrict what a metavariable can match:

rules:
  - id: no-print-debug
    patterns:
      - pattern: print($MSG)
      - metavariable-regex:
          metavariable: $MSG
          regex: ".*(debug|DEBUG|temp|TEMP|todo|TODO).*"
    message: Remove debug print statement.
    severity: WARNING
    languages:
      - python

This rule only fires when the argument to print() contains words like “debug”, “temp”, or “todo” - catching leftover debugging statements without flagging legitimate print calls.

Combining patterns with pattern-either and pattern-not

Real-world detection scenarios often require matching multiple alternatives or excluding certain cases. Semgrep provides boolean operators for combining patterns.

pattern-either (OR logic)

pattern-either matches when any one of its child patterns matches:

rules:
  - id: dangerous-deserialization
    pattern-either:
      - pattern: pickle.loads(...)
      - pattern: pickle.load(...)
      - pattern: yaml.load(..., Loader=yaml.FullLoader)
      - pattern: yaml.unsafe_load(...)
    message: >
      Deserialization of untrusted data can lead to remote code
      execution. Use safe alternatives like json.loads() or
      yaml.safe_load().
    severity: ERROR
    languages:
      - python

This single rule catches four different dangerous deserialization patterns.

pattern-not (exclusion)

pattern-not excludes matches that fit a specific pattern. It is used inside a patterns block to filter out false positives:

rules:
  - id: unparameterized-query
    patterns:
      - pattern: cursor.execute($QUERY)
      - pattern-not: cursor.execute("...")
    message: >
      Use parameterized queries to prevent SQL injection.
      Pass parameters as the second argument to cursor.execute().
    severity: ERROR
    languages:
      - python

This rule matches cursor.execute(query) or cursor.execute(f"SELECT * FROM {table}") but does not match cursor.execute("SELECT * FROM users") because a static string literal is safe from injection.

pattern-not-inside (context exclusion)

pattern-not-inside excludes matches that appear within a certain code context:

rules:
  - id: missing-error-handling
    patterns:
      - pattern: $CLIENT.send($DATA)
      - pattern-not-inside: |
          try:
              ...
          except ...:
              ...
    message: Network send operations should be wrapped in try/except.
    severity: WARNING
    languages:
      - python

This flags client.send(data) calls that are not inside a try/except block.

Combining AND and OR logic

You can nest pattern-either inside patterns for complex rules:

rules:
  - id: insecure-cookie
    patterns:
      - pattern-either:
          - pattern: response.set_cookie($NAME, $VALUE, ...)
          - pattern: Response(..., set_cookie=[$NAME, $VALUE, ...], ...)
      - pattern-not: response.set_cookie($NAME, $VALUE, ..., secure=True, ...)
      - pattern-not: response.set_cookie($NAME, $VALUE, ..., httponly=True, ...)
    message: Cookies should be set with secure=True and httponly=True.
    severity: WARNING
    languages:
      - python

Taint tracking - following data from source to sink

For vulnerabilities like SQL injection and cross-site scripting, simple pattern matching is not enough. You need to trace how untrusted data flows through the program. Semgrep’s taint mode does exactly this.

A taint tracking rule has three components:

Sources - where untrusted data enters the application
Sinks - where that data would be dangerous
Sanitizers - functions that neutralize the danger (optional)

SQL injection detection with taint mode

rules:
  - id: sql-injection
    mode: taint
    message: >
      User input flows into a SQL query without sanitization.
      Use parameterized queries or an ORM to prevent SQL injection.
    severity: ERROR
    languages:
      - python
    metadata:
      cwe:
        - "CWE-89: SQL Injection"
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form.get(...)
      - pattern: request.form[...]
      - pattern: request.args[...]
      - pattern: request.json
    pattern-sinks:
      - pattern: cursor.execute($QUERY, ...)
      - pattern: db.engine.execute($QUERY)
      - pattern: connection.execute($QUERY)
    pattern-sanitizers:
      - pattern: bleach.clean(...)
      - pattern: escape(...)
      - pattern: int(...)
      - pattern: float(...)

This rule traces data from Flask request objects to database execute calls. If user input reaches a SQL execution function without passing through a sanitizer, Semgrep reports a finding. The taint engine follows the data through variable assignments, function calls, and data transformations.

XSS detection with taint mode

rules:
  - id: xss-reflected
    mode: taint
    message: >
      User input is rendered in HTML output without escaping,
      creating a reflected XSS vulnerability.
    severity: ERROR
    languages:
      - javascript
      - typescript
    metadata:
      cwe:
        - "CWE-79: Cross-site Scripting (XSS)"
    pattern-sources:
      - pattern: req.query.$PARAM
      - pattern: req.body.$PARAM
      - pattern: req.params.$PARAM
    pattern-sinks:
      - pattern: res.send($DATA)
      - pattern: res.write($DATA)
      - pattern: $EL.innerHTML = $DATA
    pattern-sanitizers:
      - pattern: DOMPurify.sanitize(...)
      - pattern: escapeHtml(...)
      - pattern: encodeURIComponent(...)

Hardcoded secrets detection

Detecting hardcoded secrets does not require taint mode since you are looking for patterns in assignments rather than data flow. A combination of pattern matching and metavariable constraints works well:

rules:
  - id: hardcoded-secret
    patterns:
      - pattern-either:
          - pattern: $VAR = "..."
          - pattern: $VAR = '...'
      - metavariable-regex:
          metavariable: $VAR
          regex: ".*(secret|password|api_key|apikey|token|auth|
            credential|private_key).*"
      - pattern-not: $VAR = ""
      - pattern-not: $VAR = "CHANGE_ME"
      - pattern-not: $VAR = "placeholder"
    message: >
      Possible hardcoded secret in variable '$VAR'. Store
      secrets in environment variables or a secrets manager.
    severity: ERROR
    languages:
      - python
      - javascript
      - typescript
      - java
      - go
      - ruby
    metadata:
      cwe:
        - "CWE-798: Use of Hard-coded Credentials"
      confidence: MEDIUM

This rule matches any variable whose name contains “secret”, “password”, “api_key”, or similar terms being assigned a string literal, while excluding empty strings and obvious placeholder values.

Fix suggestions - automating remediation

Semgrep can automatically fix the code it flags. Add a fix key to your rule with replacement code that references the same metavariables from your pattern:

rules:
  - id: use-strict-equality
    pattern: $A == $B
    fix: $A === $B
    message: Use strict equality (===) instead of loose equality (==).
    severity: WARNING
    languages:
      - javascript
      - typescript

Run with the --autofix flag to apply fixes:

semgrep --config rules/ --autofix .

Conditional fixes with fix-regex

For more complex replacements, use fix-regex:

rules:
  - id: update-deprecated-import
    pattern: from old_module import $FUNC
    fix: from new_module import $FUNC
    message: old_module is deprecated. Import from new_module instead.
    severity: WARNING
    languages:
      - python

Autofix is particularly valuable during large-scale migrations - renaming functions, updating import paths, or replacing deprecated API calls across hundreds of files.

Testing your rules

Semgrep has a built-in testing framework that ensures your rules match what they should and do not match what they should not. This is critical for maintaining rule quality over time.

Creating test files

For each rule, create a test file in the same directory with annotated comments:

# test file: no-eval-test.py

# ruleid: no-eval
eval("print('hello')")

# ruleid: no-eval
eval(user_input)

# ruleid: no-eval
result = eval(compile(code, '<string>', 'exec'))

# ok: no-eval
ast.literal_eval("{'key': 'value'}")

# ok: no-eval
some_other_function("eval is fine in strings")

Lines marked with # ruleid: <rule-id> should trigger the rule. Lines marked with # ok: <rule-id> should not. Run the tests:

semgrep --test rules/

Semgrep checks every annotated line and reports any mismatches. A passing test suite gives you confidence that rule changes do not introduce false positives or false negatives.

Test-driven rule development

A good workflow is to write tests first, then develop the rule:

Collect real code samples - both vulnerable and safe versions
Annotate them with # ruleid and # ok comments
Write a rule that passes all test cases
Run against your actual codebase to verify the results
Add any new false positive cases to the test file and refine

This test-driven approach produces rules with far fewer false positives than ad-hoc pattern writing.

Publishing rules to the Semgrep Registry

Once your rules are battle-tested, you can share them with the broader community through the Semgrep Registry.

Publishing to the public registry

Fork the semgrep-rules repository
Add your rule YAML file to the appropriate language directory (e.g., python/security/)
Include a test file with the same base name and annotated test cases
Open a pull request

The Semgrep team reviews submissions for quality and accuracy. Rules should have a clear security or correctness rationale, minimal false positives, and comprehensive test cases.

For rules specific to your organization, you do not need the public registry. Host your rules in a private Git repository and reference them in your Semgrep configuration:

# .semgrep.yml at the root of your project
rules:
  - p/default
  - r/your-org.your-rule-id

Or reference a local directory or a remote repository:

# Run rules from a local directory
semgrep --config ./custom-rules/ .

# Run rules from a private repository
semgrep --config "https://github.com/your-org/semgrep-rules" .

For teams using the Semgrep Cloud platform, you can manage private rules through the web dashboard, assign them to specific projects, and track their findings centrally.

Real-world examples

Let us look at three complete, production-ready rules that address common vulnerability classes.

Example 1 - Detecting SQL injection in Django

rules:
  - id: django-raw-sql-injection
    mode: taint
    message: >
      User input from the request is used in a raw SQL query.
      Use Django ORM methods or parameterized queries with
      cursor.execute(sql, [params]) to prevent SQL injection.
    severity: ERROR
    languages:
      - python
    metadata:
      cwe:
        - "CWE-89: SQL Injection"
      confidence: HIGH
      category: security
    pattern-sources:
      - pattern: request.GET.get(...)
      - pattern: request.POST.get(...)
      - pattern: request.GET[...]
      - pattern: request.POST[...]
      - pattern: request.data[...]
      - pattern: request.query_params[...]
    pattern-sinks:
      - pattern: $MODEL.objects.raw($QUERY, ...)
      - pattern: cursor.execute($QUERY, ...)
      - pattern: connection.cursor().execute($QUERY)
      - patterns:
          - pattern: RawSQL($QUERY, ...)
    pattern-sanitizers:
      - pattern: int(...)
      - pattern: float(...)
      - pattern: str(int(...))

Example 2 - Preventing XSS in React components

rules:
  - id: react-dangerouslysetinnerhtml
    patterns:
      - pattern: |
          <$EL dangerouslySetInnerHTML={{__html: $CONTENT}} />
      - pattern-not: |
          <$EL dangerouslySetInnerHTML={{__html: DOMPurify.sanitize($CONTENT)}} />
    message: >
      dangerouslySetInnerHTML is used without DOMPurify sanitization.
      This can lead to XSS if $CONTENT contains user-controlled data.
      Wrap the content with DOMPurify.sanitize() before rendering.
    severity: ERROR
    languages:
      - javascript
      - typescript
    metadata:
      cwe:
        - "CWE-79: Cross-site Scripting (XSS)"
      confidence: MEDIUM

Example 3 - Catching hardcoded AWS credentials

rules:
  - id: hardcoded-aws-credentials
    pattern-either:
      - pattern: boto3.client(..., aws_access_key_id="...", ...)
      - pattern: boto3.Session(aws_access_key_id="...", ...)
      - pattern: |
          $CLIENT = boto3.client(
              ...,
              aws_secret_access_key="...",
              ...
          )
    message: >
      AWS credentials are hardcoded in the source code. Use
      environment variables, IAM roles, or AWS Secrets Manager
      instead.
    severity: ERROR
    languages:
      - python
    metadata:
      cwe:
        - "CWE-798: Use of Hard-coded Credentials"
      confidence: HIGH

Organizing rules at scale

As your rule library grows, organization becomes important. A recommended directory structure:

custom-rules/
  python/
    security/
      sql-injection.yaml
      xss.yaml
      hardcoded-secrets.yaml
    correctness/
      null-checks.yaml
      type-errors.yaml
    best-practices/
      deprecated-apis.yaml
      logging-standards.yaml
  javascript/
    security/
      dom-xss.yaml
      prototype-pollution.yaml
    best-practices/
      no-console-log.yaml
  tests/
    python/
      sql-injection-test.py
      xss-test.py
    javascript/
      dom-xss-test.js

Group rules by language first, then by category. Keep test files in a parallel directory structure or alongside their rule files. Use a .semgrep.yml configuration file at the root of each project to specify which rule directories to include.

Alternatives to custom rule writing

Writing custom Semgrep rules gives you maximum flexibility, but it requires time and expertise. If your team needs security scanning without the overhead of rule authorship, several alternatives exist.

CodeAnt AI provides a managed code review and security platform priced at $24 to $40 per user per month that includes built-in security rules covering OWASP Top 10 vulnerabilities, code quality checks, and automated PR reviews. CodeAnt AI is a strong option for teams that want comprehensive coverage out of the box without writing or maintaining custom rules.

For teams already committed to the Semgrep ecosystem, the Semgrep Cloud platform offers 20,000+ Pro rules maintained by professional security researchers. The platform is free for up to 10 contributors and costs $35 per contributor per month on the Team plan. See our Semgrep pricing breakdown for details.

You can also explore other tools in the SAST space. Our Semgrep alternatives guide compares options like SonarQube, Checkmarx, and Snyk. For specific comparisons, see Semgrep vs ESLint for JavaScript-focused teams or Semgrep vs SonarQube for teams evaluating enterprise SAST platforms.

Best practices for writing effective rules

Based on experience maintaining large custom rule sets, here are the practices that produce the best results:

Start narrow, expand gradually. Write rules that match the exact pattern you observed in a real bug or vulnerability. Run the rule against your entire codebase and review every finding before broadening the pattern. A rule that fires twice with 100% accuracy is more valuable than one that fires 200 times with 50% accuracy.

Write comprehensive test cases. Every rule should have test cases for both positive matches and negative cases. Include edge cases - what about the pattern inside a comment? Inside a string? In a different but syntactically similar context? The Semgrep test framework makes this easy to maintain.

Use meaningful rule IDs and messages. A developer who sees a Semgrep finding in their pull request needs to understand the problem and know how to fix it immediately. Include the “why” in your message, not just the “what”. Link to internal documentation when relevant.

Version control your rules. Store custom rules in a dedicated Git repository with code review requirements. Treat rule changes like code changes - they should be reviewed, tested, and documented.

Monitor false positive rates. Track how often developers suppress your rules with # nosemgrep comments. A high suppression rate indicates the rule needs refinement. The Semgrep Cloud dashboard provides this metric automatically.

Leverage taint mode for injection vulnerabilities. Pattern matching alone produces too many false positives for injection-class vulnerabilities. Taint mode dramatically improves precision by requiring an actual data flow path from source to sink.

Conclusion

Custom Semgrep rules turn your team’s security knowledge and coding standards into automated checks that run on every pull request. The rule syntax is deliberately accessible - patterns look like the code they match, YAML provides the structure, and built-in testing ensures quality over time. Features like taint tracking, metavariable constraints, and autofix suggestions make it possible to detect complex vulnerabilities and even remediate them automatically.

Start with one or two rules that address real problems your team has encountered. Write test cases, run the rules in CI, and expand your library as you gain confidence. The investment in custom rules pays dividends every time they catch a bug before it reaches production.

For a complete walkthrough of installing and configuring Semgrep in your development environment, see our how to setup Semgrep guide. To understand how Semgrep fits into the broader landscape of static analysis tools, explore our Semgrep alternatives comparison.

Frequently Asked Questions

What is a Semgrep custom rule?

A Semgrep custom rule is a YAML file that defines a pattern to search for in source code. Each rule specifies an id, a pattern written in the target language's syntax, a message to display when the pattern matches, a severity level (ERROR, WARNING, or INFO), and a list of languages to scan. Custom rules let you enforce organization-specific coding standards, detect proprietary API misuse, and catch domain-specific security vulnerabilities that public rule sets do not cover.

How do I write my first Semgrep rule?

Create a YAML file with a rules array containing one rule object. Set the id to a descriptive name like 'no-hardcoded-passwords'. Write a pattern that mirrors the code you want to detect - for example, 'password = "..."' to match any hardcoded password string. Add a human-readable message explaining the issue, set severity to WARNING or ERROR, and list the target languages. Save the file and run 'semgrep --config your-rule.yaml .' to test it against your codebase.

What are Semgrep metavariables and how do they work?

Metavariables are placeholders in Semgrep patterns that match any expression or identifier in the target code. They start with a dollar sign followed by a name, like $X, $FUNC, or $PASSWORD. When Semgrep finds a match, each metavariable binds to the actual code it matched, and you can reference that binding in the rule message or in other pattern clauses. For example, the pattern 'eval($INPUT)' uses $INPUT to match whatever argument is passed to eval, and you can print that value in the rule message using the syntax $INPUT.

What is the difference between pattern, patterns, and pattern-either in Semgrep?

The 'pattern' key matches a single code pattern. The 'patterns' key combines multiple conditions with AND logic - all patterns must match for the rule to fire. The 'pattern-either' key combines multiple patterns with OR logic - any one of the listed patterns matching is enough to trigger the rule. You can nest pattern-either inside patterns to create rules like 'match any of these dangerous functions AND make sure the input is not sanitized'. This composability is what makes Semgrep rules powerful for complex detection scenarios.

How does Semgrep taint tracking work?

Semgrep taint tracking (taint mode) follows data from untrusted sources through the program to dangerous sinks. You define a rule with mode set to 'taint', then specify pattern-sources (where untrusted data enters, like request.GET or request.body), pattern-sinks (where that data would be dangerous, like cursor.execute or innerHTML), and optionally pattern-sanitizers (functions that make the data safe, like escape or parameterize). Semgrep traces the data flow between source and sink, reporting a finding only when tainted data reaches a sink without passing through a sanitizer.

Can Semgrep rules suggest automatic fixes?

Yes. Add a 'fix' key to your rule with the replacement code. Semgrep uses the metavariables from the matched pattern to construct the fix. For example, if your pattern is 'requests.get($URL, verify=False)' you can set fix to 'requests.get($URL, verify=True)'. When you run 'semgrep --autofix', Semgrep applies the fix directly to your source files. This is useful for automated migrations and enforcing API changes across a large codebase.

How do I test custom Semgrep rules?

Create a test file in the same directory as your rule with the naming convention rule-id.py (matching the target language extension). Annotate test cases with comments: '# ruleid: your-rule-id' on the line before code that should match, and '# ok: your-rule-id' on the line before code that should not match. Run 'semgrep --test' to execute the tests. Semgrep verifies that every annotated line produces the expected result. This testing workflow ensures your rules do not regress as you refine them.

How do I publish custom rules to the Semgrep Registry?

Fork the semgrep-rules repository on GitHub, add your rule YAML file to the appropriate language directory, include a test file with annotated test cases, and open a pull request. The Semgrep team reviews submissions for quality, accuracy, and false positive rates. Accepted rules become available to all Semgrep users through the registry. Alternatively, you can share rules within your organization by hosting them in a private repository and referencing that repository in your Semgrep configuration.

What languages does Semgrep support for custom rules?

Semgrep supports custom rules for over 30 languages including Python, JavaScript, TypeScript, Java, Go, Ruby, C, C++, C#, Rust, Kotlin, Swift, PHP, Scala, OCaml, Lua, Bash, Terraform HCL, Dockerfile, and Kubernetes YAML. The pattern syntax adapts to each language's grammar, so a Java rule uses Java syntax and a Python rule uses Python syntax. Generic mode is available for languages without a dedicated parser, using basic pattern matching on any text file.

How do I reduce false positives in custom Semgrep rules?

Use pattern-not to exclude known-safe code patterns from your matches. Add pattern-not-inside to exclude matches that appear within certain contexts, such as test files or exception handlers. Use metavariable-regex to constrain metavariables to specific values. Write thorough test cases with both positive matches and negative cases to verify your rule does not over-match. Start with narrow, high-confidence patterns and gradually broaden them as you confirm the results are accurate.

Can I use Semgrep rules to enforce coding standards, not just find security bugs?

Yes. Semgrep is widely used for enforcing coding standards, API usage patterns, and architectural constraints. Common non-security use cases include banning deprecated function calls, requiring error handling around specific operations, enforcing naming conventions with metavariable-regex, ensuring all database queries use parameterized statements, and blocking direct imports from internal packages that should only be accessed through a facade. Any pattern you can express in code can become a Semgrep rule.

What is the ellipsis operator in Semgrep patterns?

The ellipsis operator (...) in Semgrep matches zero or more arguments, statements, or other syntactic elements. In a function call like 'func(..., $KEY, ...)' it matches func called with any number of arguments as long as one of them binds to $KEY. In a code block, '...' matches any number of statements. This makes patterns flexible enough to match code regardless of how many other arguments, parameters, or statements surround the part you care about.

Explore More

Tool Reviews

Free Newsletter

Stay ahead with AI dev tools

Weekly insights on AI code review, static analysis, and developer productivity. No spam, unsubscribe anytime.

Join developers getting weekly AI tool insights.

how-to

Codacy GitHub Integration: Complete Setup and Configuration Guide

Learn how to integrate Codacy with GitHub step by step. Covers GitHub App install, PR analysis, quality gates, coverage reports, and config.

March 13, 2026

how-to

Codacy GitLab Integration: Setup and Configuration Guide (2026)

Set up Codacy with GitLab step by step. Covers OAuth, project import, MR analysis, quality gates, coverage reporting, and GitLab CI config.

March 13, 2026

how-to

How to Set Up Codacy with Jenkins for Automated Review

Set up Codacy with Jenkins for automated code review. Covers plugin setup, Jenkinsfile config, quality gates, coverage, and multibranch pipelines.

March 13, 2026

Why write custom Semgrep rules

Rule syntax basics - the YAML structure

Additional rule metadata

Pattern matching fundamentals

Matching function calls

Matching assignments

Matching string literals

The ellipsis operator in depth

Metavariables - capturing and reusing matched code

Basic metavariable usage

Named metavariables for clarity

Metavariable consistency

Constraining metavariables with regex

Combining patterns with pattern-either and pattern-not

pattern-either (OR logic)

pattern-not (exclusion)

pattern-not-inside (context exclusion)

Combining AND and OR logic

Taint tracking - following data from source to sink

SQL injection detection with taint mode

XSS detection with taint mode

Hardcoded secrets detection

Fix suggestions - automating remediation

Conditional fixes with fix-regex

Testing your rules

Creating test files

Test-driven rule development

Publishing rules to the Semgrep Registry

Publishing to the public registry

Sharing rules within your organization

Real-world examples

Example 1 - Detecting SQL injection in Django

Example 2 - Preventing XSS in React components

Example 3 - Catching hardcoded AWS credentials

Organizing rules at scale

Alternatives to custom rule writing

Best practices for writing effective rules

Conclusion

Frequently Asked Questions

Explore More

Stay ahead with AI dev tools

Related Articles

Codacy GitHub Integration: Complete Setup and Configuration Guide

Codacy GitLab Integration: Setup and Configuration Guide (2026)

How to Set Up Codacy with Jenkins for Automated Review