Scanning Reference

Error handling, performance considerations, and best practices for the Data Scanning API.

Error Handling

Common Error Codes

HTTP Status	Cause	Solution
`400`	Invalid parameters or content type	Check namespace, verify content type header
`401`	Invalid/expired API key	Verify authentication
`403`	Missing Data Scanning permission	Generate key with Data Scanning permission
`404`	Invalid endpoint	Check URL path (`/api/scan` or `/api/scan-dynamic`)
`500`	Shield processing error	Check Shield logs, retry request

Error Response Format

{
  "errorCode": 400,
  "message": "Missing required parameter: namespace"
}

Production Error Handling Strategies

Fail Closed (Most Secure)

Block the request if scanning fails:

def protect_data(data, namespace):
    try:
        response = requests.post(
            f"{SHIELD_URL}/api/scan",
            headers=HEADERS,
            params={"namespace": namespace},
            json=data,
            timeout=5
        )
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        logger.error(f"Shield scan failed: {e}")
        raise ValueError("Data protection service unavailable")

Fail Open (Less Secure)

Allow the request to proceed with logging:

def protect_data(data, namespace):
    try:
        response = requests.post(
            f"{SHIELD_URL}/api/scan",
            headers=HEADERS,
            params={"namespace": namespace},
            json=data,
            timeout=5
        )
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        logger.error(f"Shield scan failed, allowing unprotected data: {e}")
        return data  # Return original data

Cached Obfuscation

Use cached obfuscation patterns as fallback:

import hashlib

def protect_data_with_fallback(data, namespace):
    try:
        response = requests.post(
            f"{SHIELD_URL}/api/scan",
            headers=HEADERS,
            params={"namespace": namespace},
            json=data,
            timeout=5
        )
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        logger.error(f"Shield scan failed, using fallback masking: {e}")
        return apply_fallback_masking(data)

def apply_fallback_masking(data):
    """Simple regex-based masking as fallback"""
    import re
    if isinstance(data, dict):
        for key, value in data.items():
            if isinstance(value, str):
                # Mask SSN-like patterns
                value = re.sub(r'\d{3}-\d{2}-\d{4}', '***-**-****', value)
                # Mask credit card-like patterns
                value = re.sub(r'\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}', '****-****-****-****', value)
                data[key] = value
    return data

Performance Considerations

Optimization Tips

Keep Payloads Small

# Bad: Scanning entire large object
large_object = {
    "metadata": {...},  # Non-sensitive
    "user": {...},      # Sensitive
    "logs": [...]       # Non-sensitive
}
protected = scan(large_object)

# Good: Scan only sensitive fields
sensitive_data = {"user": large_object["user"]}
protected_user = scan(sensitive_data)
large_object["user"] = protected_user

Use scan-dynamic for Targeted Scanning

cURLPythonNode.js

# Scan only specific data types instead of all configured types
curl -X POST "https://shield:8080/api/scan-dynamic?namespace=api&obfuscatedDataTypes=$SSN_ID" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"ssn": "123-45-6789", "name": "John Doe"}'

# Scan only specific data types instead of all configured types
protected = requests.post(
    f"{BASE_URL}/api/scan-dynamic",
    headers=HEADERS,
    params={
        "namespace": "api",
        "obfuscatedDataTypes": ssn_id  # Only scan SSN
    },
    json=data
).json()

// Scan only specific data types instead of all configured types
const protected = await axios.post(
  `${BASE_URL}/api/scan-dynamic`,
  data,
  {
    headers: HEADERS,
    params: {
      namespace: 'api',
      obfuscatedDataTypes: ssnId  // Only scan SSN
    }
  }
);
console.log(protected.data);

Reuse HTTP Connections

cURLPythonNode.js

# Use connection reuse with curl
# Multiple requests reuse the same TCP connection
for i in {1..10}; do
  curl -X POST "https://shield:8080/api/scan?namespace=api" \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"data": "sensitive info"}' \
    --keepalive-time 60
done

import requests

# Create session for connection pooling
session = requests.Session()
session.headers.update({"Authorization": f"Bearer {API_KEY}"})

def scan_data(data, namespace):
    return session.post(
        f"{BASE_URL}/api/scan",
        params={"namespace": namespace},
        json=data
    ).json()

const axios = require('axios');

// Create axios instance with connection pooling
const client = axios.create({
  baseURL: BASE_URL,
  headers: { 'Authorization': `Bearer ${API_KEY}` },
  keepAlive: true
});

function scanData(data, namespace) {
  return client.post('/api/scan', data, {
    params: { namespace }
  });
}

Process Scans Asynchronously

import asyncio
import aiohttp

async def scan_async(session, data, namespace):
    async with session.post(
        f"{BASE_URL}/api/scan",
        params={"namespace": namespace},
        json=data
    ) as response:
        return await response.json()

async def scan_batch(data_list):
    async with aiohttp.ClientSession(headers=HEADERS) as session:
        tasks = [scan_async(session, data, "api") for data in data_list]
        return await asyncio.gather(*tasks)

# Scan multiple items concurrently
results = asyncio.run(scan_batch([data1, data2, data3]))

Using Built-in Data Types

cURLPythonNode.js

# For built-in data types (US_SSN, CREDIT_CARD, etc.),
# use the type name directly - no need to fetch IDs

# Use type names directly in scan-dynamic
curl -X POST "https://shield:8080/api/scan-dynamic?obfuscatedDataTypes=US_SSN,CREDIT_CARD" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"ssn": "123-45-6789", "card": "4111111111111111"}'

# For built-in data types, use the type name directly
# No need to fetch from the API

builtin_types = {
    "ssn": "US_SSN",
    "credit_card": "CREDIT_CARD",
    "email": "EMAIL_ADDRESS",
    "phone": "PHONE_NUMBER"
}

# Use type names directly
response = requests.post(
    f"{BASE_URL}/api/scan-dynamic",
    headers=HEADERS,
    params={"obfuscatedDataTypes": builtin_types["ssn"]},
    json={"data": "123-45-6789"}
)

// For built-in data types, use the type name directly
// No need to fetch from the API

const builtinTypes = {
  ssn: 'US_SSN',
  creditCard: 'CREDIT_CARD',
  email: 'EMAIL_ADDRESS',
  phone: 'PHONE_NUMBER'
};

// Use type names directly
const response = await axios.post(
  `${BASE_URL}/api/scan-dynamic`,
  { data: '123-45-6789' },
  {
    headers: HEADERS,
    params: { obfuscatedDataTypes: builtinTypes.ssn }
  }
);

Activity Logging

All scan operations generate activity records queryable via the Activities API.

Logged Fields

namespace (in URL field)
username and usergroup (if provided)
Data types detected/obfuscated
Rules applied (for /api/scan)
Timestamp and duration
icapMode = "API"

Querying API Scans

PythoncURLNode.js

import requests

BASE_URL = "https://shield:8080"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}

# Build query for API scans in last 7 days
query = {
    "simpleToAdvanced": {
        "types": ["API"],
        "timestamp": {
            "withinLast": {
                "days": 7,
                "hours": 0,
                "minutes": 0
            }
        }
    }
}

# Convert to advanced query
search_response = requests.post(
    f"{BASE_URL}/api/activities/convertsearch",
    headers=HEADERS,
    json=query
)

search_query = search_response.json()["simpleToAdvanced"]

# Get activities
activities_response = requests.get(
    f"{BASE_URL}/api/activities",
    headers=HEADERS,
    params={"search": search_query}
)

activities = activities_response.json()["items"]
print(f"Total API scans: {len(activities)}")

# Build query
QUERY=$(curl -s -X POST "https://shield:8080/api/activities/convertsearch" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "simpleToAdvanced": {
      "types": ["API"]
    }
  }' | jq -r '.simpleToAdvanced')

# Get activities
curl "https://shield:8080/api/activities?search=$(echo $QUERY | jq -sRr @uri)" \
  -H "Authorization: Bearer $API_KEY"

const axios = require('axios');

const BASE_URL = 'https://shield:8080';
const HEADERS = { 'Authorization': 'Bearer YOUR_API_KEY' };

// Build query for API scans in last 7 days
const query = {
  simpleToAdvanced: {
    types: ['API'],
    timestamp: {
      withinLast: {
        days: 7,
        hours: 0,
        minutes: 0
      }
    }
  }
};

// Convert to advanced query
const searchResponse = await axios.post(
  `${BASE_URL}/api/activities/convertsearch`,
  query,
  { headers: HEADERS }
);

const searchQuery = searchResponse.data.simpleToAdvanced;

// Get activities
const activitiesResponse = await axios.get(
  `${BASE_URL}/api/activities`,
  {
    headers: HEADERS,
    params: { search: searchQuery }
  }
);

const activities = activitiesResponse.data.items;
console.log(`Total API scans: ${activities.length}`);

Best Practices

Security

Protect API keys - Never commit keys to version control, use environment variables
Use HTTPS - Always use HTTPS for Shield communication
Rotate keys regularly - Rotate API keys periodically
Limit key permissions - Use keys with minimal required permissions
Monitor usage - Review activity logs regularly

Integration

Start with /api/scan - Use policy-based scanning for most use cases
Use /api/scan-dynamic for special cases - Testing, custom workflows, scenario-specific masking
Provide username/usergroup - Enables user-based rules and better audit trails
Handle errors gracefully - Implement appropriate error handling for your use case
Test in staging - Validate policies in staging before production

Namespace Design

Be descriptive - Use clear namespace identifiers (e.g., payment-api, hr-system)
Environment-specific - Include environment in namespace (e.g., payment-api-prod)
Group related endpoints - Use same namespace for related API endpoints
Document namespaces - Keep a record of namespace to application mappings

Performance

Keep payloads small - Scan only sensitive fields when possible
Use connection pooling - Reuse HTTP connections
Process asynchronously - Don't block critical paths
Set appropriate timeouts - Don't wait indefinitely for responses
Cache reference data - Cache data type and mask format IDs

Monitoring

Track scan failures - Log all scanning errors
Monitor response times - Alert on slow scans
Review activity logs - Regularly audit scanning activity
Set up alerts - Configure alerts for unusual patterns

Scanning Overview - Main scanning concepts
Policy Scan - Policy-based scanning
Dynamic Scan - Explicit control scanning
Activities API - Query scan activity logs
Authentication - Generate API keys

Scanning Reference

Error Handling

Common Error Codes

Error Response Format

Production Error Handling Strategies

Performance Considerations

Optimization Tips

Activity Logging

Logged Fields

Querying API Scans

Best Practices

Security

Integration

Namespace Design

Performance

Monitoring

Related Topics