Data Extractors
TRECO provides a powerful, plugin-based extraction system for parsing HTTP responses and extracting data into variables.
Overview
Extractors allow you to capture data from HTTP responses for use in subsequent requests. The extraction system supports multiple formats and uses a plugin architecture for extensibility.
Basic Syntax
extract:
variable_name:
type: extractor_type
pattern: "extraction_pattern"
All extracted variables are stored in the execution context and can be accessed in later states using the format {{ state_name.variable_name }}.
Available Extractors
JSONPath (jpath)
Extract data from JSON responses using JSONPath expressions.
Type names: jpath, jsonpath, json_path
Syntax:
extract:
token:
type: jpath
pattern: "$.access_token"
Common Patterns:
# Root level field
pattern: "$.field_name"
# Nested field
pattern: "$.user.profile.email"
# Array element
pattern: "$.items[0].id"
# All elements in array
pattern: "$.items[*].id"
# Filter by condition
pattern: "$.users[?(@.active==true)].name"
Example:
states:
login:
request: |
POST /api/login HTTP/1.1
Content-Type: application/json
{"username": "user", "password": "pass"}
extract:
access_token:
type: jpath
pattern: "$.access_token"
refresh_token:
type: jpath
pattern: "$.refresh_token"
user_id:
type: jpath
pattern: "$.user.id"
XPath (xpath)
Extract data from XML/HTML responses using XPath expressions.
Type names: xpath, xml_path, html_path
Syntax:
extract:
csrf_token:
type: xpath
pattern: '//input[@name="csrf"]/@value'
Common Patterns:
# Element by ID
pattern: '//*[@id="element-id"]'
# Input value by name
pattern: '//input[@name="field_name"]/@value'
# Link href
pattern: '//a[@class="link"]/@href'
# Text content
pattern: '//div[@class="message"]/text()'
# Meta tag content
pattern: '//meta[@name="csrf-token"]/@content'
Example:
states:
get_form:
request: |
GET /form HTTP/1.1
Host: {{ config.host }}
extract:
csrf_token:
type: xpath
pattern: '//input[@name="csrf_token"]/@value'
form_action:
type: html_path
pattern: '//form/@action'
Regex (regex)
Extract data using regular expressions with capture groups.
Type names: regex, re, regexp
Syntax:
extract:
session_id:
type: regex
pattern: "SESSION=([A-Z0-9]+)"
The first capture group () is returned as the extracted value.
Common Patterns:
# Cookie value
pattern: "SESSIONID=([a-zA-Z0-9]+)"
# Bearer token
pattern: 'Bearer ([a-zA-Z0-9._-]+)'
# UUID
pattern: '([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})'
# Number
pattern: 'balance["\s:]+(\d+\.?\d*)'
# Between quotes
pattern: '"token":"([^"]+)"'
Example:
states:
get_session:
request: |
GET /api/session HTTP/1.1
Host: {{ config.host }}
extract:
session_id:
type: regex
pattern: 'session_id=([a-f0-9]{32})'
auth_code:
type: re
pattern: 'code=([A-Z0-9]+)'
Boundary (boundary)
Extract data between left and right delimiters. Simpler alternative to regex for common patterns.
Type names: boundary, between, delimited
Syntax:
extract:
token:
type: boundary
pattern: '"token":"|||"'
The pattern uses ||| as a separator between the left and right boundaries.
Special Markers:
^- Beginning of line (for left boundary)$- End of line (for right boundary)
Common Patterns:
# Between delimiters
pattern: '"token":"|||"'
# Until end of line
pattern: 'Authorization: |||$'
# From beginning of line
pattern: '^|||: value'
# HTML attribute value
pattern: 'value="|||"'
# JSON field value
pattern: '"balance":|||,'
Example:
states:
parse_response:
request: |
GET /api/data HTTP/1.1
Host: {{ config.host }}
extract:
api_key:
type: boundary
pattern: '"api_key":"|||"'
auth_header:
type: between
pattern: 'X-Auth-Token: |||$'
Header (header)
Extract values from HTTP response headers (case-insensitive).
Type names: header, headers, http_header
Syntax:
extract:
request_id:
type: header
pattern: "X-Request-Id"
Common Headers:
# Custom auth header
pattern: "X-Auth-Token"
# Request ID
pattern: "X-Request-Id"
# Content type
pattern: "Content-Type"
# Location (for redirects)
pattern: "Location"
# Rate limit info
pattern: "X-RateLimit-Remaining"
Example:
states:
get_auth:
request: |
POST /api/auth HTTP/1.1
Host: {{ config.host }}
extract:
auth_token:
type: header
pattern: "X-Auth-Token"
rate_limit:
type: headers
pattern: "X-RateLimit-Remaining"
JWT (jwt)
Decode and extract data from JSON Web Tokens (JWT). Perfect for extracting user information, checking token expiration, and validating JWT structure in API security testing.
Type names: jwt
Extract Specific Claims:
extract:
user_id:
type: jwt
source: "{{ access_token }}"
claim: sub
user_role:
type: jwt
source: "{{ access_token }}"
claim: role
email:
type: jwt
source: "{{ access_token }}"
claim: email
Extract JWT Parts:
extract:
# Get entire payload
jwt_payload:
type: jwt
source: "{{ token }}"
part: payload
# Get header (algorithm, type, etc.)
jwt_header:
type: jwt
source: "{{ token }}"
part: header
# Get signature
jwt_signature:
type: jwt
source: "{{ token }}"
part: signature
Validation Checks:
extract:
# Check if token has expired
is_expired:
type: jwt
source: "{{ token }}"
check: expired
# Get algorithm (HS256, RS256, etc.)
algorithm:
type: jwt
source: "{{ token }}"
check: algorithm
# Check if structure is valid
is_valid:
type: jwt
source: "{{ token }}"
check: valid
With Signature Verification:
extract:
verified_payload:
type: jwt
source: "{{ token }}"
part: payload
verify: true
secret: "{{ jwt_secret }}"
algorithms: ["HS256", "HS512"]
Common JWT Claims:
sub- Subject (usually user ID)iss- Issueraud- Audienceexp- Expiration timestampnbf- Not Before timestampiat- Issued At timestampjti- JWT IDrole,roles- User role(s)permissions- User permissionsemail,username- User identity
Security Testing Example:
states:
analyze_jwt:
request: |
GET /api/protected HTTP/1.1
Authorization: Bearer {{ token }}
extract:
algorithm:
type: jwt
source: "{{ token }}"
check: algorithm
is_expired:
type: jwt
source: "{{ token }}"
check: expired
user_role:
type: jwt
source: "{{ token }}"
claim: role
logger:
on_state_leave: |
{% if algorithm == 'none' %}
🚨 CRITICAL: JWT uses 'none' algorithm!
{% elif algorithm == 'HS256' %}
⚠ WARNING: JWT uses symmetric algorithm
{% endif %}
{% if is_expired %}
🚨 Token is expired but still accepted!
{% endif %}
Extractor Summary
Type |
Aliases |
Best For |
|---|---|---|
|
|
JSON API responses |
|
|
HTML forms, XML responses |
|
|
Complex patterns, mixed content |
|
|
Simple text extraction |
|
|
Response headers |
|
|
Session cookies, tokens |
|
JWT token analysis, claims extraction |
Using Extracted Variables
Extracted variables are stored in the context and can be accessed in templates:
states:
login:
extract:
token:
type: jpath
pattern: "$.token"
use_token:
request: |
GET /api/data HTTP/1.1
Authorization: Bearer {{ login.token }}
Variable Naming
Use lowercase with underscores:
user_id,auth_tokenAvoid reserved words:
config,thread,contextBe descriptive:
access_tokennott
Accessing Variables
# From previous state
{{ state_name.variable_name }}
# From current state (in logger)
{{ variable_name }}
# From config
{{ config.host }}
# Thread info (in race states)
{{ thread.id }}
{{ thread.count }}
Creating Custom Extractors
You can create custom extractors by implementing the BaseExtractor interface:
from treco.http.extractor.base import BaseExtractor, register_extractor
@register_extractor('custom', aliases=['my_extractor'])
class CustomExtractor(BaseExtractor):
"""Custom extractor for specific data formats."""
def extract(self, response, pattern):
"""
Extract data from response.
Args:
response: ResponseProtocol object
pattern: Extraction pattern string
Returns:
Extracted value or None if not found
"""
# Your extraction logic here
content = response.text
# ... process content using pattern ...
return extracted_value
The @register_extractor decorator automatically registers your extractor with the specified type name and aliases.
Best Practices
Choose the right extractor: Use JSONPath for JSON, XPath for HTML, regex for complex patterns
Be specific with patterns: Avoid overly broad patterns that might match wrong data
Handle missing data: Extractors return
Noneif pattern doesn’t matchTest patterns: Verify patterns work with actual response data
Use aliases: Different teams may prefer different naming conventions
Troubleshooting
Pattern not matching:
Check the response content type
Verify the pattern syntax
Use verbose mode to see actual response
Test pattern with sample data
Wrong data extracted:
Make patterns more specific
Use capture groups correctly in regex
Check for multiple matches (first match is used)
Extractor type not found:
Check spelling and aliases
Ensure you’re using a valid type name
Custom extractors must be imported before use
See Also
Configuration Reference - YAML configuration reference
Template Engine - Template syntax and filters
Attack Examples - Real-world attack examples