Data Extractors

TRECO provides a powerful, plugin-based extraction system for parsing HTTP responses and extracting data into variables.

Overview

Extractors allow you to capture data from HTTP responses for use in subsequent requests. The extraction system supports multiple formats and uses a plugin architecture for extensibility.

Basic Syntax

extract:
  variable_name:
    type: extractor_type
    pattern: "extraction_pattern"

All extracted variables are stored in the execution context and can be accessed in later states using the format {{ state_name.variable_name }}.

Available Extractors

JSONPath (jpath)

Extract data from JSON responses using JSONPath expressions.

Type names: jpath, jsonpath, json_path

Syntax:

extract:
  token:
    type: jpath
    pattern: "$.access_token"

Common Patterns:

# Root level field
pattern: "$.field_name"

# Nested field
pattern: "$.user.profile.email"

# Array element
pattern: "$.items[0].id"

# All elements in array
pattern: "$.items[*].id"

# Filter by condition
pattern: "$.users[?(@.active==true)].name"

Example:

states:
  login:
    request: |
      POST /api/login HTTP/1.1
      Content-Type: application/json

      {"username": "user", "password": "pass"}

    extract:
      access_token:
        type: jpath
        pattern: "$.access_token"
      refresh_token:
        type: jpath
        pattern: "$.refresh_token"
      user_id:
        type: jpath
        pattern: "$.user.id"

XPath (xpath)

Extract data from XML/HTML responses using XPath expressions.

Type names: xpath, xml_path, html_path

Syntax:

extract:
  csrf_token:
    type: xpath
    pattern: '//input[@name="csrf"]/@value'

Common Patterns:

# Element by ID
pattern: '//*[@id="element-id"]'

# Input value by name
pattern: '//input[@name="field_name"]/@value'

# Link href
pattern: '//a[@class="link"]/@href'

# Text content
pattern: '//div[@class="message"]/text()'

# Meta tag content
pattern: '//meta[@name="csrf-token"]/@content'

Example:

states:
  get_form:
    request: |
      GET /form HTTP/1.1
      Host: {{ config.host }}

    extract:
      csrf_token:
        type: xpath
        pattern: '//input[@name="csrf_token"]/@value'
      form_action:
        type: html_path
        pattern: '//form/@action'

Regex (regex)

Extract data using regular expressions with capture groups.

Type names: regex, re, regexp

Syntax:

extract:
  session_id:
    type: regex
    pattern: "SESSION=([A-Z0-9]+)"

The first capture group () is returned as the extracted value.

Common Patterns:

# Cookie value
pattern: "SESSIONID=([a-zA-Z0-9]+)"

# Bearer token
pattern: 'Bearer ([a-zA-Z0-9._-]+)'

# UUID
pattern: '([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})'

# Number
pattern: 'balance["\s:]+(\d+\.?\d*)'

# Between quotes
pattern: '"token":"([^"]+)"'

Example:

states:
  get_session:
    request: |
      GET /api/session HTTP/1.1
      Host: {{ config.host }}

    extract:
      session_id:
        type: regex
        pattern: 'session_id=([a-f0-9]{32})'
      auth_code:
        type: re
        pattern: 'code=([A-Z0-9]+)'

Boundary (boundary)

Extract data between left and right delimiters. Simpler alternative to regex for common patterns.

Type names: boundary, between, delimited

Syntax:

extract:
  token:
    type: boundary
    pattern: '"token":"|||"'

The pattern uses ||| as a separator between the left and right boundaries.

Special Markers:

  • ^ - Beginning of line (for left boundary)

  • $ - End of line (for right boundary)

Common Patterns:

# Between delimiters
pattern: '"token":"|||"'

# Until end of line
pattern: 'Authorization: |||$'

# From beginning of line
pattern: '^|||: value'

# HTML attribute value
pattern: 'value="|||"'

# JSON field value
pattern: '"balance":|||,'

Example:

states:
  parse_response:
    request: |
      GET /api/data HTTP/1.1
      Host: {{ config.host }}

    extract:
      api_key:
        type: boundary
        pattern: '"api_key":"|||"'
      auth_header:
        type: between
        pattern: 'X-Auth-Token: |||$'

Header (header)

Extract values from HTTP response headers (case-insensitive).

Type names: header, headers, http_header

Syntax:

extract:
  request_id:
    type: header
    pattern: "X-Request-Id"

Common Headers:

# Custom auth header
pattern: "X-Auth-Token"

# Request ID
pattern: "X-Request-Id"

# Content type
pattern: "Content-Type"

# Location (for redirects)
pattern: "Location"

# Rate limit info
pattern: "X-RateLimit-Remaining"

Example:

states:
  get_auth:
    request: |
      POST /api/auth HTTP/1.1
      Host: {{ config.host }}

    extract:
      auth_token:
        type: header
        pattern: "X-Auth-Token"
      rate_limit:
        type: headers
        pattern: "X-RateLimit-Remaining"

JWT (jwt)

Decode and extract data from JSON Web Tokens (JWT). Perfect for extracting user information, checking token expiration, and validating JWT structure in API security testing.

Type names: jwt

Extract Specific Claims:

extract:
  user_id:
    type: jwt
    source: "{{ access_token }}"
    claim: sub

  user_role:
    type: jwt
    source: "{{ access_token }}"
    claim: role

  email:
    type: jwt
    source: "{{ access_token }}"
    claim: email

Extract JWT Parts:

extract:
  # Get entire payload
  jwt_payload:
    type: jwt
    source: "{{ token }}"
    part: payload

  # Get header (algorithm, type, etc.)
  jwt_header:
    type: jwt
    source: "{{ token }}"
    part: header

  # Get signature
  jwt_signature:
    type: jwt
    source: "{{ token }}"
    part: signature

Validation Checks:

extract:
  # Check if token has expired
  is_expired:
    type: jwt
    source: "{{ token }}"
    check: expired

  # Get algorithm (HS256, RS256, etc.)
  algorithm:
    type: jwt
    source: "{{ token }}"
    check: algorithm

  # Check if structure is valid
  is_valid:
    type: jwt
    source: "{{ token }}"
    check: valid

With Signature Verification:

extract:
  verified_payload:
    type: jwt
    source: "{{ token }}"
    part: payload
    verify: true
    secret: "{{ jwt_secret }}"
    algorithms: ["HS256", "HS512"]

Common JWT Claims:

  • sub - Subject (usually user ID)

  • iss - Issuer

  • aud - Audience

  • exp - Expiration timestamp

  • nbf - Not Before timestamp

  • iat - Issued At timestamp

  • jti - JWT ID

  • role, roles - User role(s)

  • permissions - User permissions

  • email, username - User identity

Security Testing Example:

states:
  analyze_jwt:
    request: |
      GET /api/protected HTTP/1.1
      Authorization: Bearer {{ token }}

    extract:
      algorithm:
        type: jwt
        source: "{{ token }}"
        check: algorithm

      is_expired:
        type: jwt
        source: "{{ token }}"
        check: expired

      user_role:
        type: jwt
        source: "{{ token }}"
        claim: role

    logger:
      on_state_leave: |
        {% if algorithm == 'none' %}
          🚨 CRITICAL: JWT uses 'none' algorithm!
        {% elif algorithm == 'HS256' %}
          ⚠ WARNING: JWT uses symmetric algorithm
        {% endif %}
        {% if is_expired %}
          🚨 Token is expired but still accepted!
        {% endif %}

Extractor Summary

Type

Aliases

Best For

jpath

jsonpath, json_path

JSON API responses

xpath

xml_path, html_path

HTML forms, XML responses

regex

re, regexp

Complex patterns, mixed content

boundary

between, delimited

Simple text extraction

header

headers, http_header

Response headers

cookie

cookies, set_cookie, set-cookie

Session cookies, tokens

jwt

JWT token analysis, claims extraction

Using Extracted Variables

Extracted variables are stored in the context and can be accessed in templates:

states:
  login:
    extract:
      token:
        type: jpath
        pattern: "$.token"

  use_token:
    request: |
      GET /api/data HTTP/1.1
      Authorization: Bearer {{ login.token }}

Variable Naming

  • Use lowercase with underscores: user_id, auth_token

  • Avoid reserved words: config, thread, context

  • Be descriptive: access_token not t

Accessing Variables

# From previous state
{{ state_name.variable_name }}

# From current state (in logger)
{{ variable_name }}

# From config
{{ config.host }}

# Thread info (in race states)
{{ thread.id }}
{{ thread.count }}

Creating Custom Extractors

You can create custom extractors by implementing the BaseExtractor interface:

from treco.http.extractor.base import BaseExtractor, register_extractor

@register_extractor('custom', aliases=['my_extractor'])
class CustomExtractor(BaseExtractor):
    """Custom extractor for specific data formats."""

    def extract(self, response, pattern):
        """
        Extract data from response.

        Args:
            response: ResponseProtocol object
            pattern: Extraction pattern string

        Returns:
            Extracted value or None if not found
        """
        # Your extraction logic here
        content = response.text
        # ... process content using pattern ...
        return extracted_value

The @register_extractor decorator automatically registers your extractor with the specified type name and aliases.

Best Practices

  1. Choose the right extractor: Use JSONPath for JSON, XPath for HTML, regex for complex patterns

  2. Be specific with patterns: Avoid overly broad patterns that might match wrong data

  3. Handle missing data: Extractors return None if pattern doesn’t match

  4. Test patterns: Verify patterns work with actual response data

  5. Use aliases: Different teams may prefer different naming conventions

Troubleshooting

Pattern not matching:

  1. Check the response content type

  2. Verify the pattern syntax

  3. Use verbose mode to see actual response

  4. Test pattern with sample data

Wrong data extracted:

  1. Make patterns more specific

  2. Use capture groups correctly in regex

  3. Check for multiple matches (first match is used)

Extractor type not found:

  1. Check spelling and aliases

  2. Ensure you’re using a valid type name

  3. Custom extractors must be imported before use

See Also