ASTRA-EXFIL-001 — Excessive Data Exposure¶

Overview¶

Excessive Data Exposure occurs when an API returns far more data than the client needs for its stated purpose, relying on the frontend to filter what is actually displayed to the user. The full object — including sensitive fields — is returned in the API response and visible to anyone intercepting or inspecting the traffic. An attacker who calls the API directly (bypassing the frontend) receives all unexposed fields: internal user flags, hashed passwords, PII, financial data, admin metadata, or relationship data.

This is one of the most common API vulnerabilities because it stems from a development convenience: returning the full model object is easier than crafting a response tailored to each use case.

Tactic¶

Exfiltration

Protocols¶

REST · GraphQL

Severity Score¶

Dimension	Score (1–5)	Rationale
Exploitability	5	Just call the API directly and read the response
Prevalence	5	Extremely common — default ORM serialization pattern
Data sensitivity	4	Often exposes PII, tokens, hashed credentials
Business impact	4	Mass data exposure, regulatory penalties
Composite	4.5 / 5

Rating: Critical

Attack Scenario¶

An attacker uses a mobile API or web app's developer tools to observe API calls and then replays them directly, inspecting the raw response rather than what the app renders.

Attacker opens a ride-sharing app and loads their own profile page
Using a proxy (Burp Suite / mitmproxy), attacker captures: GET /api/v1/users/me
App displays: name, profile photo, star rating
Raw API response contains: name, email, phone, hashed password, internal user flags (is_banned: false, fraud_score: 12), exact GPS coordinates from last 20 rides, payment method tokens, and admin notes
Attacker extracts all hidden fields — none of which were displayed in the app but all were transmitted

Scale amplification via enumeration¶

Combine with ASTRA-AUTHZ-001 (BOLA): once excessive data exposure is confirmed on /users/{id}, an attacker can enumerate all user IDs and harvest the full dataset for every user.

Example Request / Payload¶

GET /api/v1/users/me HTTP/1.1
Host: target.example.com
Authorization: Bearer eyJhbGciOiJIUzI1NiJ9...

What the app shows:

{
  "name": "Alice Smith",
  "avatar": "https://cdn.example.com/avatars/alice.jpg",
  "rating": 4.8
}

What the API actually returns:

{
  "id": 10482,
  "name": "Alice Smith",
  "email": "alice@example.com",
  "phone": "+44 7911 123456",
  "password_hash": "$2b$12$LQv3c1yqBWVHxkd...",
  "avatar": "https://cdn.example.com/avatars/alice.jpg",
  "rating": 4.8,
  "is_banned": false,
  "fraud_score": 12,
  "admin_notes": "VIP customer — escalate complaints",
  "stripe_customer_id": "cus_NffrFeUfNV2Hib",
  "last_location": {"lat": 51.5074, "lng": -0.1278},
  "created_at": "2021-03-14T09:22:11Z",
  "internal_flags": ["beta_tester", "legacy_account"]
}

Real-World Breach Mapping¶

Field	Detail
Incident	Optus Australia data breach
Year	2022
Organisation	Optus (Singtel subsidiary)
What happened	An unauthenticated API endpoint returned full customer records including names, dates of birth, phone numbers, email addresses, and identity document numbers. The API required no authentication and exposed the complete customer object rather than a filtered response.
ASTRA technique	ASTRA-EXFIL-001
Source	https://www.itnews.com.au/news/optus-breach-an-api-with-no-authentication-591019

Detection¶

Sigma Rule¶

See detection-rules/sigma/ASTRA-EXFIL-001.yml

What to look for¶

API responses significantly larger than the UI would require (e.g. /users/me returning >2KB when the displayed profile is minimal)
Sensitive field names in response bodies: password, hash, token, secret, ssn, dob, internal, admin, fraud
High-frequency calls to user/profile endpoints from a single client, especially if cross-user (correlate with BOLA detection)
Requests from non-browser clients (missing or unusual user agents) to endpoints typically called by the frontend app

WAF / Gateway rule hint¶

Implement response scanning at the API gateway layer. Flag responses containing keywords like password_hash, ssn, tax_id, internal_, _secret, admin_note in JSON keys. This is a data loss prevention (DLP) pattern applied to API responses.

Remediation¶

Define explicit response schemas — never return a full ORM model object; define a dedicated response DTO/serializer for each endpoint that includes only required fields
Adopt an API specification-first approach — use OpenAPI to define exactly what each endpoint returns, then validate responses against that spec in tests
Implement response validation in CI — add automated tests that call each endpoint and assert the response does not contain sensitive field names
Use field-level access control for GraphQL — don't rely on the frontend to skip fields; implement resolver-level field permissions
Conduct regular API response audits — periodically spider your own API and analyse response payloads for unexpected sensitive data