Scanning Reference
Error handling, performance considerations, and best practices for the Data Scanning API.
Error Handling
Common Error Codes
| HTTP Status | Cause | Solution |
|---|---|---|
400 |
Invalid parameters or content type | Check namespace, verify content type header |
401 |
Invalid/expired API key | Verify authentication |
403 |
Missing Data Scanning permission | Generate key with Data Scanning permission |
404 |
Invalid endpoint | Check URL path (/api/scan or /api/scan-dynamic) |
500 |
Shield processing error | Check Shield logs, retry request |
Error Response Format
Production Error Handling Strategies
Fail Closed (Most Secure)
Block the request if scanning fails:
def protect_data(data, namespace):
try:
response = requests.post(
f"{SHIELD_URL}/api/scan",
headers=HEADERS,
params={"namespace": namespace},
json=data,
timeout=5
)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
logger.error(f"Shield scan failed: {e}")
raise ValueError("Data protection service unavailable")
Fail Open (Less Secure)
Allow the request to proceed with logging:
def protect_data(data, namespace):
try:
response = requests.post(
f"{SHIELD_URL}/api/scan",
headers=HEADERS,
params={"namespace": namespace},
json=data,
timeout=5
)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
logger.error(f"Shield scan failed, allowing unprotected data: {e}")
return data # Return original data
Cached Obfuscation
Use cached obfuscation patterns as fallback:
import hashlib
def protect_data_with_fallback(data, namespace):
try:
response = requests.post(
f"{SHIELD_URL}/api/scan",
headers=HEADERS,
params={"namespace": namespace},
json=data,
timeout=5
)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
logger.error(f"Shield scan failed, using fallback masking: {e}")
return apply_fallback_masking(data)
def apply_fallback_masking(data):
"""Simple regex-based masking as fallback"""
import re
if isinstance(data, dict):
for key, value in data.items():
if isinstance(value, str):
# Mask SSN-like patterns
value = re.sub(r'\d{3}-\d{2}-\d{4}', '***-**-****', value)
# Mask credit card-like patterns
value = re.sub(r'\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}', '****-****-****-****', value)
data[key] = value
return data
Performance Considerations
Optimization Tips
Keep Payloads Small
# Bad: Scanning entire large object
large_object = {
"metadata": {...}, # Non-sensitive
"user": {...}, # Sensitive
"logs": [...] # Non-sensitive
}
protected = scan(large_object)
# Good: Scan only sensitive fields
sensitive_data = {"user": large_object["user"]}
protected_user = scan(sensitive_data)
large_object["user"] = protected_user
Use scan-dynamic for Targeted Scanning
Reuse HTTP Connections
const axios = require('axios');
// Create axios instance with connection pooling
const client = axios.create({
baseURL: BASE_URL,
headers: { 'Authorization': `Bearer ${API_KEY}` },
keepAlive: true
});
function scanData(data, namespace) {
return client.post('/api/scan', data, {
params: { namespace }
});
}
Process Scans Asynchronously
import asyncio
import aiohttp
async def scan_async(session, data, namespace):
async with session.post(
f"{BASE_URL}/api/scan",
params={"namespace": namespace},
json=data
) as response:
return await response.json()
async def scan_batch(data_list):
async with aiohttp.ClientSession(headers=HEADERS) as session:
tasks = [scan_async(session, data, "api") for data in data_list]
return await asyncio.gather(*tasks)
# Scan multiple items concurrently
results = asyncio.run(scan_batch([data1, data2, data3]))
Cache Reference Data
# Cache data type IDs in shell variables
# Fetch once and reuse
DATATYPES=$(curl -s "https://shield:8080/api/datatypes" \
-H "Authorization: Bearer $API_KEY")
SSN_ID=$(echo "$DATATYPES" | jq -r '.items[] | select(.type=="US_SSN") | .id')
CC_ID=$(echo "$DATATYPES" | jq -r '.items[] | select(.type=="CREDIT_CARD") | .id')
# Use cached IDs in subsequent requests
curl -X POST "https://shield:8080/api/scan-dynamic?obfuscatedDataTypes=$SSN_ID" \
-H "Authorization: Bearer $API_KEY" \
-d '{"data": "123-45-6789"}'
from functools import lru_cache
@lru_cache(maxsize=1)
def get_datatype_ids():
"""Cache data type IDs to avoid repeated API calls"""
response = requests.get(f"{BASE_URL}/api/datatypes", headers=HEADERS)
return {dt["type"]: dt["id"] for dt in response.json()["items"]}
# Use cached IDs
datatype_ids = get_datatype_ids()
ssn_id = datatype_ids["US_SSN"]
// Cache data type IDs in memory
let cachedDatatypes = null;
async function getDatatypeIds() {
if (!cachedDatatypes) {
const response = await axios.get(`${BASE_URL}/api/datatypes`, {
headers: HEADERS
});
cachedDatatypes = response.data.items.reduce((acc, dt) => {
acc[dt.type] = dt.id;
return acc;
}, {});
}
return cachedDatatypes;
}
// Use cached IDs
const datatypeIds = await getDatatypeIds();
const ssnId = datatypeIds['US_SSN'];
Activity Logging
All scan operations generate activity records queryable via the Activities API.
Logged Fields
namespace(in URL field)usernameandusergroup(if provided)- Data types detected/obfuscated
- Rules applied (for
/api/scan) - Timestamp and duration
icapMode="API"
Querying API Scans
import requests
BASE_URL = "https://shield:8080"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}
# Build query for API scans in last 7 days
query = {
"simpleToAdvanced": {
"types": ["API"],
"timestamp": {
"withinLast": {
"days": 7,
"hours": 0,
"minutes": 0
}
}
}
}
# Convert to advanced query
search_response = requests.post(
f"{BASE_URL}/api/activities/convertsearch",
headers=HEADERS,
json=query
)
search_query = search_response.json()["simpleToAdvanced"]
# Get activities
activities_response = requests.get(
f"{BASE_URL}/api/activities",
headers=HEADERS,
params={"search": search_query}
)
activities = activities_response.json()["items"]
print(f"Total API scans: {len(activities)}")
# Build query
QUERY=$(curl -s -X POST "https://shield:8080/api/activities/convertsearch" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"simpleToAdvanced": {
"types": ["API"]
}
}' | jq -r '.simpleToAdvanced')
# Get activities
curl "https://shield:8080/api/activities?search=$(echo $QUERY | jq -sRr @uri)" \
-H "Authorization: Bearer $API_KEY"
const axios = require('axios');
const BASE_URL = 'https://shield:8080';
const HEADERS = { 'Authorization': 'Bearer YOUR_API_KEY' };
// Build query for API scans in last 7 days
const query = {
simpleToAdvanced: {
types: ['API'],
timestamp: {
withinLast: {
days: 7,
hours: 0,
minutes: 0
}
}
}
};
// Convert to advanced query
const searchResponse = await axios.post(
`${BASE_URL}/api/activities/convertsearch`,
query,
{ headers: HEADERS }
);
const searchQuery = searchResponse.data.simpleToAdvanced;
// Get activities
const activitiesResponse = await axios.get(
`${BASE_URL}/api/activities`,
{
headers: HEADERS,
params: { search: searchQuery }
}
);
const activities = activitiesResponse.data.items;
console.log(`Total API scans: ${activities.length}`);
Best Practices
Security
- Protect API keys - Never commit keys to version control, use environment variables
- Use HTTPS - Always use HTTPS for Shield communication
- Rotate keys regularly - Rotate API keys periodically
- Limit key permissions - Use keys with minimal required permissions
- Monitor usage - Review activity logs regularly
Integration
- Start with /api/scan - Use policy-based scanning for most use cases
- Use /api/scan-dynamic for special cases - Testing, custom workflows, scenario-specific masking
- Provide username/usergroup - Enables user-based rules and better audit trails
- Handle errors gracefully - Implement appropriate error handling for your use case
- Test in staging - Validate policies in staging before production
Namespace Design
- Be descriptive - Use clear namespace identifiers (e.g.,
payment-api,hr-system) - Environment-specific - Include environment in namespace (e.g.,
payment-api-prod) - Group related endpoints - Use same namespace for related API endpoints
- Document namespaces - Keep a record of namespace to application mappings
Performance
- Keep payloads small - Scan only sensitive fields when possible
- Use connection pooling - Reuse HTTP connections
- Process asynchronously - Don't block critical paths
- Set appropriate timeouts - Don't wait indefinitely for responses
- Cache reference data - Cache data type and mask format IDs
Monitoring
- Track scan failures - Log all scanning errors
- Monitor response times - Alert on slow scans
- Review activity logs - Regularly audit scanning activity
- Set up alerts - Configure alerts for unusual patterns
Related Topics
- Scanning Overview - Main scanning concepts
- Policy Scan - Policy-based scanning
- Dynamic Scan - Explicit control scanning
- Activities API - Query scan activity logs
- Authentication - Generate API keys