Scanning Reference
Error handling, performance considerations, and best practices for the Data Scanning API.
Error Handling
Common Error Codes
| HTTP Status | Cause | Solution |
|---|---|---|
400 |
Invalid parameters or content type | Check namespace, verify content type header |
401 |
Invalid/expired API key | Verify authentication |
403 |
Missing Data Scanning permission | Generate key with Data Scanning permission |
404 |
Invalid endpoint | Check URL path (/api/scan or /api/scan-dynamic) |
500 |
Shield processing error | Check Shield logs, retry request |
Error Response Format
Production Error Handling Strategies
Fail Closed (Most Secure)
Block the request if scanning fails:
def protect_data(data, namespace):
try:
response = requests.post(
f"{SHIELD_URL}/api/scan",
headers=HEADERS,
params={"namespace": namespace},
json=data,
timeout=5
)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
logger.error(f"Shield scan failed: {e}")
raise ValueError("Data protection service unavailable")
Fail Open (Less Secure)
Allow the request to proceed with logging:
def protect_data(data, namespace):
try:
response = requests.post(
f"{SHIELD_URL}/api/scan",
headers=HEADERS,
params={"namespace": namespace},
json=data,
timeout=5
)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
logger.error(f"Shield scan failed, allowing unprotected data: {e}")
return data # Return original data
Cached Obfuscation
Use cached obfuscation patterns as fallback:
import hashlib
def protect_data_with_fallback(data, namespace):
try:
response = requests.post(
f"{SHIELD_URL}/api/scan",
headers=HEADERS,
params={"namespace": namespace},
json=data,
timeout=5
)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
logger.error(f"Shield scan failed, using fallback masking: {e}")
return apply_fallback_masking(data)
def apply_fallback_masking(data):
"""Simple regex-based masking as fallback"""
import re
if isinstance(data, dict):
for key, value in data.items():
if isinstance(value, str):
# Mask SSN-like patterns
value = re.sub(r'\d{3}-\d{2}-\d{4}', '***-**-****', value)
# Mask credit card-like patterns
value = re.sub(r'\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}', '****-****-****-****', value)
data[key] = value
return data
Performance Considerations
Optimization Tips
Keep Payloads Small
# Bad: Scanning entire large object
large_object = {
"metadata": {...}, # Non-sensitive
"user": {...}, # Sensitive
"logs": [...] # Non-sensitive
}
protected = scan(large_object)
# Good: Scan only sensitive fields
sensitive_data = {"user": large_object["user"]}
protected_user = scan(sensitive_data)
large_object["user"] = protected_user
Use scan-dynamic for Targeted Scanning
Reuse HTTP Connections
const axios = require('axios');
// Create axios instance with connection pooling
const client = axios.create({
baseURL: BASE_URL,
headers: { 'Authorization': `Bearer ${API_KEY}` },
keepAlive: true
});
function scanData(data, namespace) {
return client.post('/api/scan', data, {
params: { namespace }
});
}
Process Scans Asynchronously
import asyncio
import aiohttp
async def scan_async(session, data, namespace):
async with session.post(
f"{BASE_URL}/api/scan",
params={"namespace": namespace},
json=data
) as response:
return await response.json()
async def scan_batch(data_list):
async with aiohttp.ClientSession(headers=HEADERS) as session:
tasks = [scan_async(session, data, "api") for data in data_list]
return await asyncio.gather(*tasks)
# Scan multiple items concurrently
results = asyncio.run(scan_batch([data1, data2, data3]))
Using Built-in Data Types
# For built-in data types (US_SSN, CREDIT_CARD, etc.),
# use the type name directly - no need to fetch IDs
# Use type names directly in scan-dynamic
curl -X POST "https://shield:8080/api/scan-dynamic?obfuscatedDataTypes=US_SSN,CREDIT_CARD" \
-H "Authorization: Bearer $API_KEY" \
-d '{"ssn": "123-45-6789", "card": "4111111111111111"}'
# For built-in data types, use the type name directly
# No need to fetch from the API
builtin_types = {
"ssn": "US_SSN",
"credit_card": "CREDIT_CARD",
"email": "EMAIL_ADDRESS",
"phone": "PHONE_NUMBER"
}
# Use type names directly
response = requests.post(
f"{BASE_URL}/api/scan-dynamic",
headers=HEADERS,
params={"obfuscatedDataTypes": builtin_types["ssn"]},
json={"data": "123-45-6789"}
)
// For built-in data types, use the type name directly
// No need to fetch from the API
const builtinTypes = {
ssn: 'US_SSN',
creditCard: 'CREDIT_CARD',
email: 'EMAIL_ADDRESS',
phone: 'PHONE_NUMBER'
};
// Use type names directly
const response = await axios.post(
`${BASE_URL}/api/scan-dynamic`,
{ data: '123-45-6789' },
{
headers: HEADERS,
params: { obfuscatedDataTypes: builtinTypes.ssn }
}
);
Activity Logging
All scan operations generate activity records queryable via the Activities API.
Logged Fields
namespace(in URL field)usernameandusergroup(if provided)- Data types detected/obfuscated
- Rules applied (for
/api/scan) - Timestamp and duration
icapMode="API"
Querying API Scans
import requests
BASE_URL = "https://shield:8080"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}
# Build query for API scans in last 7 days
query = {
"simpleToAdvanced": {
"types": ["API"],
"timestamp": {
"withinLast": {
"days": 7,
"hours": 0,
"minutes": 0
}
}
}
}
# Convert to advanced query
search_response = requests.post(
f"{BASE_URL}/api/activities/convertsearch",
headers=HEADERS,
json=query
)
search_query = search_response.json()["simpleToAdvanced"]
# Get activities
activities_response = requests.get(
f"{BASE_URL}/api/activities",
headers=HEADERS,
params={"search": search_query}
)
activities = activities_response.json()["items"]
print(f"Total API scans: {len(activities)}")
# Build query
QUERY=$(curl -s -X POST "https://shield:8080/api/activities/convertsearch" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"simpleToAdvanced": {
"types": ["API"]
}
}' | jq -r '.simpleToAdvanced')
# Get activities
curl "https://shield:8080/api/activities?search=$(echo $QUERY | jq -sRr @uri)" \
-H "Authorization: Bearer $API_KEY"
const axios = require('axios');
const BASE_URL = 'https://shield:8080';
const HEADERS = { 'Authorization': 'Bearer YOUR_API_KEY' };
// Build query for API scans in last 7 days
const query = {
simpleToAdvanced: {
types: ['API'],
timestamp: {
withinLast: {
days: 7,
hours: 0,
minutes: 0
}
}
}
};
// Convert to advanced query
const searchResponse = await axios.post(
`${BASE_URL}/api/activities/convertsearch`,
query,
{ headers: HEADERS }
);
const searchQuery = searchResponse.data.simpleToAdvanced;
// Get activities
const activitiesResponse = await axios.get(
`${BASE_URL}/api/activities`,
{
headers: HEADERS,
params: { search: searchQuery }
}
);
const activities = activitiesResponse.data.items;
console.log(`Total API scans: ${activities.length}`);
Best Practices
Security
- Protect API keys - Never commit keys to version control, use environment variables
- Use HTTPS - Always use HTTPS for Shield communication
- Rotate keys regularly - Rotate API keys periodically
- Limit key permissions - Use keys with minimal required permissions
- Monitor usage - Review activity logs regularly
Integration
- Start with /api/scan - Use policy-based scanning for most use cases
- Use /api/scan-dynamic for special cases - Testing, custom workflows, scenario-specific masking
- Provide username/usergroup - Enables user-based rules and better audit trails
- Handle errors gracefully - Implement appropriate error handling for your use case
- Test in staging - Validate policies in staging before production
Namespace Design
- Be descriptive - Use clear namespace identifiers (e.g.,
payment-api,hr-system) - Environment-specific - Include environment in namespace (e.g.,
payment-api-prod) - Group related endpoints - Use same namespace for related API endpoints
- Document namespaces - Keep a record of namespace to application mappings
Performance
- Keep payloads small - Scan only sensitive fields when possible
- Use connection pooling - Reuse HTTP connections
- Process asynchronously - Don't block critical paths
- Set appropriate timeouts - Don't wait indefinitely for responses
- Cache reference data - Cache data type and mask format IDs
Monitoring
- Track scan failures - Log all scanning errors
- Monitor response times - Alert on slow scans
- Review activity logs - Regularly audit scanning activity
- Set up alerts - Configure alerts for unusual patterns
Related Topics
- Scanning Overview - Main scanning concepts
- Policy Scan - Policy-based scanning
- Dynamic Scan - Explicit control scanning
- Activities API - Query scan activity logs
- Authentication - Generate API keys