ASTRA-EXFIL-001 — Excessive Data Exposure¶
Overview¶
Excessive Data Exposure occurs when an API returns far more data than the client needs for its stated purpose, relying on the frontend to filter what is actually displayed to the user. The full object — including sensitive fields — is returned in the API response and visible to anyone intercepting or inspecting the traffic. An attacker who calls the API directly (bypassing the frontend) receives all unexposed fields: internal user flags, hashed passwords, PII, financial data, admin metadata, or relationship data.
This is one of the most common API vulnerabilities because it stems from a development convenience: returning the full model object is easier than crafting a response tailored to each use case.
Tactic¶
Exfiltration
Protocols¶
REST · GraphQL
Severity Score¶
| Dimension | Score (1–5) | Rationale |
|---|---|---|
| Exploitability | 5 | Just call the API directly and read the response |
| Prevalence | 5 | Extremely common — default ORM serialization pattern |
| Data sensitivity | 4 | Often exposes PII, tokens, hashed credentials |
| Business impact | 4 | Mass data exposure, regulatory penalties |
| Composite | 4.5 / 5 |
Rating: Critical
Attack Scenario¶
An attacker uses a mobile API or web app's developer tools to observe API calls and then replays them directly, inspecting the raw response rather than what the app renders.
- Attacker opens a ride-sharing app and loads their own profile page
- Using a proxy (Burp Suite / mitmproxy), attacker captures:
GET /api/v1/users/me - App displays: name, profile photo, star rating
- Raw API response contains: name, email, phone, hashed password, internal user flags (
is_banned: false,fraud_score: 12), exact GPS coordinates from last 20 rides, payment method tokens, and admin notes - Attacker extracts all hidden fields — none of which were displayed in the app but all were transmitted
Scale amplification via enumeration¶
Combine with ASTRA-AUTHZ-001 (BOLA): once excessive data exposure is confirmed on /users/{id}, an attacker can enumerate all user IDs and harvest the full dataset for every user.
Example Request / Payload¶
GET /api/v1/users/me HTTP/1.1
Host: target.example.com
Authorization: Bearer eyJhbGciOiJIUzI1NiJ9...
What the app shows:
What the API actually returns:
{
"id": 10482,
"name": "Alice Smith",
"email": "alice@example.com",
"phone": "+44 7911 123456",
"password_hash": "$2b$12$LQv3c1yqBWVHxkd...",
"avatar": "https://cdn.example.com/avatars/alice.jpg",
"rating": 4.8,
"is_banned": false,
"fraud_score": 12,
"admin_notes": "VIP customer — escalate complaints",
"stripe_customer_id": "cus_NffrFeUfNV2Hib",
"last_location": {"lat": 51.5074, "lng": -0.1278},
"created_at": "2021-03-14T09:22:11Z",
"internal_flags": ["beta_tester", "legacy_account"]
}
Real-World Breach Mapping¶
| Field | Detail |
|---|---|
| Incident | Optus Australia data breach |
| Year | 2022 |
| Organisation | Optus (Singtel subsidiary) |
| What happened | An unauthenticated API endpoint returned full customer records including names, dates of birth, phone numbers, email addresses, and identity document numbers. The API required no authentication and exposed the complete customer object rather than a filtered response. |
| ASTRA technique | ASTRA-EXFIL-001 |
| Source | https://www.itnews.com.au/news/optus-breach-an-api-with-no-authentication-591019 |
Detection¶
Sigma Rule¶
See detection-rules/sigma/ASTRA-EXFIL-001.yml
What to look for¶
- API responses significantly larger than the UI would require (e.g.
/users/mereturning >2KB when the displayed profile is minimal) - Sensitive field names in response bodies:
password,hash,token,secret,ssn,dob,internal,admin,fraud - High-frequency calls to user/profile endpoints from a single client, especially if cross-user (correlate with BOLA detection)
- Requests from non-browser clients (missing or unusual user agents) to endpoints typically called by the frontend app
WAF / Gateway rule hint¶
Implement response scanning at the API gateway layer. Flag responses containing keywords like password_hash, ssn, tax_id, internal_, _secret, admin_note in JSON keys. This is a data loss prevention (DLP) pattern applied to API responses.
Remediation¶
- Define explicit response schemas — never return a full ORM model object; define a dedicated response DTO/serializer for each endpoint that includes only required fields
- Adopt an API specification-first approach — use OpenAPI to define exactly what each endpoint returns, then validate responses against that spec in tests
- Implement response validation in CI — add automated tests that call each endpoint and assert the response does not contain sensitive field names
- Use field-level access control for GraphQL — don't rely on the frontend to skip fields; implement resolver-level field permissions
- Conduct regular API response audits — periodically spider your own API and analyse response payloads for unexpected sensitive data