Skip to content

Scanning Reference

Error handling, performance considerations, and best practices for the Data Scanning API.


Error Handling

Common Error Codes

HTTP Status Cause Solution
400 Invalid parameters or content type Check namespace, verify content type header
401 Invalid/expired API key Verify authentication
403 Missing Data Scanning permission Generate key with Data Scanning permission
404 Invalid endpoint Check URL path (/api/scan or /api/scan-dynamic)
500 Shield processing error Check Shield logs, retry request

Error Response Format

{
  "errorCode": 400,
  "message": "Missing required parameter: namespace"
}

Production Error Handling Strategies

Fail Closed (Most Secure)

Block the request if scanning fails:

def protect_data(data, namespace):
    try:
        response = requests.post(
            f"{SHIELD_URL}/api/scan",
            headers=HEADERS,
            params={"namespace": namespace},
            json=data,
            timeout=5
        )
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        logger.error(f"Shield scan failed: {e}")
        raise ValueError("Data protection service unavailable")
Fail Open (Less Secure)

Allow the request to proceed with logging:

def protect_data(data, namespace):
    try:
        response = requests.post(
            f"{SHIELD_URL}/api/scan",
            headers=HEADERS,
            params={"namespace": namespace},
            json=data,
            timeout=5
        )
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        logger.error(f"Shield scan failed, allowing unprotected data: {e}")
        return data  # Return original data
Cached Obfuscation

Use cached obfuscation patterns as fallback:

import hashlib

def protect_data_with_fallback(data, namespace):
    try:
        response = requests.post(
            f"{SHIELD_URL}/api/scan",
            headers=HEADERS,
            params={"namespace": namespace},
            json=data,
            timeout=5
        )
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        logger.error(f"Shield scan failed, using fallback masking: {e}")
        return apply_fallback_masking(data)

def apply_fallback_masking(data):
    """Simple regex-based masking as fallback"""
    import re
    if isinstance(data, dict):
        for key, value in data.items():
            if isinstance(value, str):
                # Mask SSN-like patterns
                value = re.sub(r'\d{3}-\d{2}-\d{4}', '***-**-****', value)
                # Mask credit card-like patterns
                value = re.sub(r'\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}', '****-****-****-****', value)
                data[key] = value
    return data

Performance Considerations

Optimization Tips

Keep Payloads Small
# Bad: Scanning entire large object
large_object = {
    "metadata": {...},  # Non-sensitive
    "user": {...},      # Sensitive
    "logs": [...]       # Non-sensitive
}
protected = scan(large_object)

# Good: Scan only sensitive fields
sensitive_data = {"user": large_object["user"]}
protected_user = scan(sensitive_data)
large_object["user"] = protected_user
Use scan-dynamic for Targeted Scanning
# Scan only specific data types instead of all configured types
curl -X POST "https://shield:8080/api/scan-dynamic?namespace=api&obfuscatedDataTypes=$SSN_ID" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"ssn": "123-45-6789", "name": "John Doe"}'
# Scan only specific data types instead of all configured types
protected = requests.post(
    f"{BASE_URL}/api/scan-dynamic",
    headers=HEADERS,
    params={
        "namespace": "api",
        "obfuscatedDataTypes": ssn_id  # Only scan SSN
    },
    json=data
).json()
// Scan only specific data types instead of all configured types
const protected = await axios.post(
  `${BASE_URL}/api/scan-dynamic`,
  data,
  {
    headers: HEADERS,
    params: {
      namespace: 'api',
      obfuscatedDataTypes: ssnId  // Only scan SSN
    }
  }
);
console.log(protected.data);
Reuse HTTP Connections
# Use connection reuse with curl
# Multiple requests reuse the same TCP connection
for i in {1..10}; do
  curl -X POST "https://shield:8080/api/scan?namespace=api" \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"data": "sensitive info"}' \
    --keepalive-time 60
done
import requests

# Create session for connection pooling
session = requests.Session()
session.headers.update({"Authorization": f"Bearer {API_KEY}"})

def scan_data(data, namespace):
    return session.post(
        f"{BASE_URL}/api/scan",
        params={"namespace": namespace},
        json=data
    ).json()
const axios = require('axios');

// Create axios instance with connection pooling
const client = axios.create({
  baseURL: BASE_URL,
  headers: { 'Authorization': `Bearer ${API_KEY}` },
  keepAlive: true
});

function scanData(data, namespace) {
  return client.post('/api/scan', data, {
    params: { namespace }
  });
}
Process Scans Asynchronously
import asyncio
import aiohttp

async def scan_async(session, data, namespace):
    async with session.post(
        f"{BASE_URL}/api/scan",
        params={"namespace": namespace},
        json=data
    ) as response:
        return await response.json()

async def scan_batch(data_list):
    async with aiohttp.ClientSession(headers=HEADERS) as session:
        tasks = [scan_async(session, data, "api") for data in data_list]
        return await asyncio.gather(*tasks)

# Scan multiple items concurrently
results = asyncio.run(scan_batch([data1, data2, data3]))
Cache Reference Data
# Cache data type IDs in shell variables
# Fetch once and reuse
DATATYPES=$(curl -s "https://shield:8080/api/datatypes" \
  -H "Authorization: Bearer $API_KEY")

SSN_ID=$(echo "$DATATYPES" | jq -r '.items[] | select(.type=="US_SSN") | .id')
CC_ID=$(echo "$DATATYPES" | jq -r '.items[] | select(.type=="CREDIT_CARD") | .id')

# Use cached IDs in subsequent requests
curl -X POST "https://shield:8080/api/scan-dynamic?obfuscatedDataTypes=$SSN_ID" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"data": "123-45-6789"}'
from functools import lru_cache

@lru_cache(maxsize=1)
def get_datatype_ids():
    """Cache data type IDs to avoid repeated API calls"""
    response = requests.get(f"{BASE_URL}/api/datatypes", headers=HEADERS)
    return {dt["type"]: dt["id"] for dt in response.json()["items"]}

# Use cached IDs
datatype_ids = get_datatype_ids()
ssn_id = datatype_ids["US_SSN"]
// Cache data type IDs in memory
let cachedDatatypes = null;

async function getDatatypeIds() {
  if (!cachedDatatypes) {
    const response = await axios.get(`${BASE_URL}/api/datatypes`, {
      headers: HEADERS
    });
    cachedDatatypes = response.data.items.reduce((acc, dt) => {
      acc[dt.type] = dt.id;
      return acc;
    }, {});
  }
  return cachedDatatypes;
}

// Use cached IDs
const datatypeIds = await getDatatypeIds();
const ssnId = datatypeIds['US_SSN'];

Activity Logging

All scan operations generate activity records queryable via the Activities API.

Logged Fields

  • namespace (in URL field)
  • username and usergroup (if provided)
  • Data types detected/obfuscated
  • Rules applied (for /api/scan)
  • Timestamp and duration
  • icapMode = "API"

Querying API Scans

import requests

BASE_URL = "https://shield:8080"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}

# Build query for API scans in last 7 days
query = {
    "simpleToAdvanced": {
        "types": ["API"],
        "timestamp": {
            "withinLast": {
                "days": 7,
                "hours": 0,
                "minutes": 0
            }
        }
    }
}

# Convert to advanced query
search_response = requests.post(
    f"{BASE_URL}/api/activities/convertsearch",
    headers=HEADERS,
    json=query
)

search_query = search_response.json()["simpleToAdvanced"]

# Get activities
activities_response = requests.get(
    f"{BASE_URL}/api/activities",
    headers=HEADERS,
    params={"search": search_query}
)

activities = activities_response.json()["items"]
print(f"Total API scans: {len(activities)}")
# Build query
QUERY=$(curl -s -X POST "https://shield:8080/api/activities/convertsearch" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "simpleToAdvanced": {
      "types": ["API"]
    }
  }' | jq -r '.simpleToAdvanced')

# Get activities
curl "https://shield:8080/api/activities?search=$(echo $QUERY | jq -sRr @uri)" \
  -H "Authorization: Bearer $API_KEY"
const axios = require('axios');

const BASE_URL = 'https://shield:8080';
const HEADERS = { 'Authorization': 'Bearer YOUR_API_KEY' };

// Build query for API scans in last 7 days
const query = {
  simpleToAdvanced: {
    types: ['API'],
    timestamp: {
      withinLast: {
        days: 7,
        hours: 0,
        minutes: 0
      }
    }
  }
};

// Convert to advanced query
const searchResponse = await axios.post(
  `${BASE_URL}/api/activities/convertsearch`,
  query,
  { headers: HEADERS }
);

const searchQuery = searchResponse.data.simpleToAdvanced;

// Get activities
const activitiesResponse = await axios.get(
  `${BASE_URL}/api/activities`,
  {
    headers: HEADERS,
    params: { search: searchQuery }
  }
);

const activities = activitiesResponse.data.items;
console.log(`Total API scans: ${activities.length}`);

Best Practices

Security

  • Protect API keys - Never commit keys to version control, use environment variables
  • Use HTTPS - Always use HTTPS for Shield communication
  • Rotate keys regularly - Rotate API keys periodically
  • Limit key permissions - Use keys with minimal required permissions
  • Monitor usage - Review activity logs regularly

Integration

  • Start with /api/scan - Use policy-based scanning for most use cases
  • Use /api/scan-dynamic for special cases - Testing, custom workflows, scenario-specific masking
  • Provide username/usergroup - Enables user-based rules and better audit trails
  • Handle errors gracefully - Implement appropriate error handling for your use case
  • Test in staging - Validate policies in staging before production

Namespace Design

  • Be descriptive - Use clear namespace identifiers (e.g., payment-api, hr-system)
  • Environment-specific - Include environment in namespace (e.g., payment-api-prod)
  • Group related endpoints - Use same namespace for related API endpoints
  • Document namespaces - Keep a record of namespace to application mappings

Performance

  • Keep payloads small - Scan only sensitive fields when possible
  • Use connection pooling - Reuse HTTP connections
  • Process asynchronously - Don't block critical paths
  • Set appropriate timeouts - Don't wait indefinitely for responses
  • Cache reference data - Cache data type and mask format IDs

Monitoring

  • Track scan failures - Log all scanning errors
  • Monitor response times - Alert on slow scans
  • Review activity logs - Regularly audit scanning activity
  • Set up alerts - Configure alerts for unusual patterns