Skip to content

Data Types Reference

Detailed reference information for the Data Types API.


Field Validation

Type Identifier

Field Constraints
Length 1-100 characters
Format Uppercased letters, numbers, underscores only
Uniqueness Must be unique across all data types
Examples ROUTING_NUMBER, API_KEY, CUSTOM_DATE_OF_BIRTH

Name

Field Constraints
Length 1-100 characters
Format Any printable characters

Description

Field Constraints
Length 0-128 characters
Format Any printable characters

Regex Guidelines

When creating custom data types:

Best Practices

  • Be specific - Avoid overly broad patterns that cause false positives
  • Test thoroughly - Use regex testing tools (regex101.com) with sample data
  • Use anchors - Consider word boundaries (\b) to avoid partial matches
  • Capture groups - Use valueGroupIndex to extract the sensitive portion

Common Patterns

Fixed-length numeric:

\b[0-9]{10}\b

Prefix with letters:

ABC-[A-Z0-9]{8}

Delimited format:

\d{3}-\d{2}-\d{4}

With word boundaries:

\b(?:EMP|STF)-\d{6}\b

Content-Type Specific Patterns

JSON-Specific Pattern (JSON Query)

Detect SSN fields in JSON using a JSON query definition. The json field uses a custom query language, not JSONPath.

curl -X POST "https://your-shield-host:8080/api/datatypes" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "SSN_JSON",
    "name": "SSN in JSON",
    "description": "Detect SSN fields in JSON content",
    "isGroupDataType": false,
    "regexes": [
      {
        "json": "{\"Search\":{\"Key\":{\"Regex\":\"(?i)ssn|social\"},\"Value\":{\"String\":{\"Regex\":\"\\\\d{3}-\\\\d{2}-\\\\d{4}\"}}}}",
        "valueGroupIndex": 0
      }
    ]
  }'
import requests
import json

BASE_URL = "https://your-shield-host:8080"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}

# JSON query definition: Search keys like "ssn" or "social"
json_query = {
    "Search": {
        "Key": {"Regex": "(?i)ssn|social"},
    }
}

datatype = {
    "type": "SSN_JSON",
    "name": "SSN in JSON",
    "description": "Detect SSN fields in JSON content",
    "isGroupDataType": False,
    "regexes": [
        {
            "json": json.dumps(json_query),
            "valueGroupIndex": 0
        }
    ]
}

response = requests.post(f"{BASE_URL}/api/datatypes", headers=HEADERS, json=datatype)
print(f"Created data type: {response.json()['id']}")
const axios = require('axios');

const BASE_URL = 'https://your-shield-host:8080';
const HEADERS = { 'Authorization': 'Bearer YOUR_API_KEY' };

// JSON query definition: Search keys like "ssn" or "social"
const jsonQuery = {
  Search: {
    Key: { Regex: '(?i)ssn|social' },
  }
};

const datatype = {
  type: 'SSN_JSON',
  name: 'SSN in JSON',
  description: 'Detect SSN fields in JSON content',
  isGroupDataType: false,
  regexes: [
    {
      json: JSON.stringify(jsonQuery),
      valueGroupIndex: 0
    }
  ]
};

const response = await axios.post(`${BASE_URL}/api/datatypes`, datatype, { headers: HEADERS });
console.log(`Created data type: ${response.data.id}`);
HTML-Specific Pattern (XPath)

Detect email addresses in HTML anchor href attributes using XPath. The html field is an XPath expression, not regex.

curl -X POST "https://your-shield-host:8080/api/datatypes" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "EMAIL_HTML",
    "name": "Email in HTML Links",
    "description": "Detect email addresses in href attributes",
    "isGroupDataType": false,
    "regexes": [
      {
        "html": "//a/@href",
        "valueGroupIndex": 0
      }
    ]
  }'
import requests

BASE_URL = "https://your-shield-host:8080"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}

datatype = {
    "type": "EMAIL_HTML",
    "name": "Email in HTML Links",
    "description": "Detect email addresses in href attributes",
    "isGroupDataType": False,
    "regexes": [
        {
            "html": "//a/@href",  # XPath: all href attributes in anchor tags
            "valueGroupIndex": 0
        }
    ]
}

response = requests.post(f"{BASE_URL}/api/datatypes", headers=HEADERS, json=datatype)
print(f"Created data type: {response.json()['id']}")
const axios = require('axios');

const BASE_URL = 'https://your-shield-host:8080';
const HEADERS = { 'Authorization': 'Bearer YOUR_API_KEY' };

const datatype = {
  type: 'EMAIL_HTML',
  name: 'Email in HTML Links',
  description: 'Detect email addresses in href attributes',
  isGroupDataType: false,
  regexes: [
    {
      html: '//a/@href',  // XPath: all href attributes in anchor tags
      valueGroupIndex: 0
    }
  ]
};

const response = await axios.post(`${BASE_URL}/api/datatypes`, datatype, { headers: HEADERS });
console.log(`Created data type: ${response.data.id}`);

Built-In Data Types

Shield includes 48 built-in data types that cannot be modified or deleted. These types are automatically available and cover common sensitive data patterns.

Identity & Government IDs

Type Identifier Display Name Description
US_SSN US SSN US Social Security Number (9 digits)
CANADIAN_SIN Canadian SIN Canadian Social Insurance Number
US_ITIN US ITIN Individual Taxpayer Identification Number
US_ATIN US ATIN Adoption Taxpayer Identification Number
US_EIN US EIN Employer Identification Number
US_DRIVERS_LICENSE US Drivers License US state driver's license numbers
PASSPORT Passport US Passport Number (8-9 digits)
VIN VIN Vehicle Identification Number (ISO 3779:2009)

Financial

Type Identifier Display Name Description
CREDIT_CARD Credit Card Credit card numbers (Visa, MasterCard, Amex, Discover, etc.)
IBAN IBAN International Bank Account Number
SWIFT_CODE SWIFT Code SWIFT/BIC codes for banks

Contact & Network

Type Identifier Display Name Description
EMAIL_ADDRESS Email Address RFC 5322 compliant email addresses
PHONE_NUMBER Phone Number US US formatted phone numbers
URL URL RFC 1630 compliant URLs
IP IP IPv4 addresses (RFC 3849)
MAC_ADDRESS Mac Address Media access control addresses
DOMAIN Domain Name DNS domain names with valid TLDs

Cloud Credentials

Type Identifier Display Name Description
AWS_SECRET Secrets AWS Amazon Web Services Access Keys and Secrets
AZURE_SECRET Secrets Azure Azure Access Keys and Secrets
GOOGLE_CLOUD_SECRET Secrets GCP Google Cloud Platform credentials and secrets

Latin American Phone Numbers

Type Identifier Display Name Description
MEXICO_PHONE_NUMBER Phone Number Mexico Mexico phone numbers
ARGENTINA_PHONE_NUMBER Phone Number Argentina Argentina phone numbers
BRAZIL_PHONE_NUMBER Phone Number Brazil Brazil phone numbers
CHILE_PHONE_NUMBER Phone Number Chile Chile phone numbers
COLOMBIA_PHONE_NUMBER Phone Number Colombia Colombia phone numbers
VENEZUELA_PHONE_NUMBER Phone Number Venezuela Venezuela phone numbers
BOLIVIA_PHONE_NUMBER Phone Number Bolivia Bolivia phone numbers
ECUADOR_PHONE_NUMBER Phone Number Ecuador Ecuador phone numbers
PARAGUAY_PHONE_NUMBER Phone Number Paraguay Paraguay phone numbers
URUGUAY_PHONE_NUMBER Phone Number Uruguay Uruguay phone numbers
PERU_PHONE_NUMBER Phone Number Peru Peru phone numbers
SPAIN_PHONE_NUMBER Phone Number Spain Spain phone numbers

CRM-Specific Types

Type Identifier Display Name Description
HUBSPOT_NAME Hubspot Name HubSpot contact names
HUBSPOT_ADDRESS Hubspot Address HubSpot addresses
HUBSPOT_DATE_OF_BIRTH Hubspot Date of Birth HubSpot date of birth fields
HUBSPOT_EMAIL_MESSAGE Hubspot Email HubSpot email message content
HUBSPOT_EMAIL_TO Hubspot Email To HubSpot email recipients
HUBSPOT_EMAIL_CC Hubspot Email CC HubSpot email CC recipients
HUBSPOT_EMAIL_BCC Hubspot Email BCC HubSpot email BCC recipients
HUBSPOT_EMAIL_FROM Hubspot Email From HubSpot email sender
HUBSPOT_EMAIL_SENDER Hubspot Email Sender HubSpot email sender info
HUBSPOT_EMAIL_BODY Hubspot Email Body HubSpot email body content
HUBSPOT_EMAIL_SUBJECT Hubspot Email Subject HubSpot email subjects
HUBSPOT_EMAIL_MESSAGE_ID Hubspot Email Message ID HubSpot email message IDs
HUBSPOT_OBJECT_TIMESTAMP Hubspot Object Timestamp HubSpot object timestamps
SALESFORCE_NAME Salesforce Name Salesforce contact names
SALESFORCE_ADDRESS Salesforce Address Salesforce addresses
SALESFORCE_DATE_OF_BIRTH Salesforce Date of Birth Salesforce date of birth fields

Note: All built-in data types use the Type Identifier in API requests. For example, use "US_SSN" when referencing the US Social Security Number type.


Group Data Types

Group data types combine multiple types into a single logical unit:

Creating Groups

{
  "type": "ALL_PII",
  "name": "All PII",
  "description": "All personally identifiable information",
  "isGroupDataType": true,
  "dataTypes": [
    "US_SSN",
    "CREDIT_CARD",
    "US_PHONE_NUMBER",
    "EMAIL_ADDRESS"
  ]
}

Behavior

  • Cascade disable - Disabling a group disables all member types
  • Detection - Any member type match triggers the group
  • Obfuscation - Can apply different masks to each member type
  • Nesting - Groups can contain other groups (be careful of circular references)

Value Group Index

The valueGroupIndex field specifies which regex capture group contains the sensitive value:

Index Meaning Example
0 Full match EMP-123456 (entire match)
1 First group EMP-(123456) extracts 123456
2 Second group (EMP)-(\d{6}) extracts digits

Example Usage

{
  "regex": "SSN:\\s*(\\d{3}-\\d{2}-\\d{4})",
  "valueGroupIndex": 1
}

This extracts only the SSN digits, not the "SSN:" prefix.


Best Practices

  • Name clearly - Use descriptive names that indicate the data type
  • Document patterns - Explain the format in the description field
  • Test with real data - Verify patterns match expected values
  • Group related types - Use group data types for easier management
  • Version control patterns - Keep regex patterns in version control
  • Avoid over-matching - Be as specific as possible to reduce false positives
  • Use word boundaries - Prevent matching partial strings
  • Escape special characters - Remember to escape regex metacharacters

Regex Testing

Before deploying custom data types:

  1. Test with regex101.com - Verify patterns match expected formats
  2. Test with Shield scanning API - Use the Data Scanning API to test detection
  3. Monitor false positives - Check Activities for unexpected matches
  4. Iterate and refine - Adjust patterns based on real-world results

Performance Considerations

  • Simple patterns are faster - Avoid complex lookaheads/lookbehinds when possible
  • Limit backtracking - Use atomic groups or possessive quantifiers
  • Anchor patterns - Use ^, $, or \b to reduce scan scope
  • Test at scale - Validate performance with representative data volumes