Technical Architecture

System Architecture

Shards employs a multi-layered architecture designed for high performance, reliability, and scalability. The system is built on microservices principles with clear separation of concerns between data ingestion, processing, storage, and serving layers.

Core Components

1. Data Ingestion Layer

Oracle Connectors

The oracle connector subsystem maintains persistent connections to multiple oracle networks:

Oracle Connectors
├── Pyth Network Connector
│   ├── Price Feed Subscriber
│   ├── Confidence Interval Monitor
│   └── Update Frequency Tracker
├── Chainlink Connector
│   ├── Aggregator Contract Reader
│   ├── Round Data Processor
│   └── Heartbeat Monitor
└── Switchboard Connector
    ├── Feed Account Monitor
    ├── Aggregator Result Parser
    └── Oracle Queue Manager

Technical Implementation:

  • WebSocket connections for real-time updates

  • Fallback to RPC polling for reliability

  • Automatic reconnection with exponential backoff

  • Data validation and sanitization at ingestion

RPC Interface

Direct integration with Solana RPC nodes:

interface RPCManager {
  primary: SolanaRPCConnection;
  fallback: SolanaRPCConnection[];
  loadBalancer: RPCLoadBalancer;
  
  methods: {
    getAccountInfo(pubkey: PublicKey): Promise<AccountInfo>;
    getProgramAccounts(programId: PublicKey): Promise<Account[]>;
    subscribeToAccount(pubkey: PublicKey): EventEmitter;
  }
}

2. Data Processing Pipeline

Normalization Engine

Transforms heterogeneous oracle data into standardized format:

interface NormalizedOracleData {
  source: OracleProvider;
  assetType: AssetClass;
  symbol: string;
  price: BigNumber;
  confidence: BigNumber;
  timestamp: number;
  slot: number;
  metadata: {
    updateFrequency: number;
    publishers: string[];
    aggregationMethod: AggregationType;
  }
}

Aggregation Service

Combines multiple oracle sources for enhanced reliability:

  • Weighted Average Calculation: Considers oracle reputation and update frequency

  • Outlier Detection: Statistical analysis to identify and exclude anomalous data

  • Confidence Scoring: Composite confidence metrics based on source agreement

3. Storage Layer

Time-Series Database

Optimized for high-throughput oracle data:

Storage Schema
├── Real-time Cache (Redis)
│   ├── Latest prices (TTL: 1s)
│   ├── Active subscriptions
│   └── Rate limiting counters
├── Hot Storage (PostgreSQL + TimescaleDB)
│   ├── 24-hour price history
│   ├── Aggregated minutely data
│   └── Active oracle metadata
└── Cold Storage (S3-compatible)
    ├── Historical data archives
    ├── Compressed hourly aggregates
    └── Audit logs

Indexing Strategy

  • Primary index on (symbol, timestamp) for time-series queries

  • Secondary indices on oracle source and asset type

  • Bloom filters for efficient existence checks

  • Partitioning by time for optimal query performance

4. API Gateway

Request Router

Intelligent routing based on query patterns:

class RequestRouter {
  route(request: APIRequest): Handler {
    if (request.isRealtime()) {
      return this.websocketHandler;
    }
    if (request.isHistorical()) {
      return this.timeseriesHandler;
    }
    if (request.isAggregated()) {
      return this.aggregationHandler;
    }
    return this.defaultHandler;
  }
}

Rate Limiting & Authentication

  • Token bucket algorithm for rate limiting

  • JWT-based authentication with refresh tokens

  • API key management with usage tracking

  • Role-based access control (RBAC)

5. LLM Integration Layer

Semantic Indexing

Vector embeddings for natural language queries:

class SemanticIndexer:
    def __init__(self):
        self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
        self.index = faiss.IndexFlatL2(384)
    
    def index_oracle_metadata(self, metadata):
        embeddings = self.encoder.encode(metadata.descriptions)
        self.index.add(embeddings)
    
    def search(self, query: str, k: int = 5):
        query_embedding = self.encoder.encode([query])
        distances, indices = self.index.search(query_embedding, k)
        return self.retrieve_data(indices)

Query Translation

Natural language to structured query conversion:

  • NLP parsing for intent recognition

  • Entity extraction for asset identification

  • Temporal expression parsing for time ranges

  • Context preservation across conversation turns

Infrastructure

Deployment Architecture

Kubernetes Cluster
├── Ingestion Pods (3 replicas)
│   ├── Oracle Connectors
│   └── RPC Interfaces
├── Processing Pods (5 replicas)
│   ├── Normalization Engine
│   └── Aggregation Service
├── API Pods (10 replicas)
│   ├── REST API
│   ├── WebSocket Server
│   └── GraphQL Endpoint
└── Storage Pods
    ├── Redis Cluster (3 nodes)
    ├── PostgreSQL (Primary + 2 Replicas)
    └── S3 Gateway

High Availability

  • Geographic Distribution: Multi-region deployment across 3+ zones

  • Automatic Failover: Health checks with automated pod restart

  • Circuit Breakers: Prevent cascade failures in oracle connections

  • Data Replication: Cross-region replication for disaster recovery

Performance Optimization

Caching Strategy

  • L1 Cache: In-memory cache for hot data (sub-millisecond access)

  • L2 Cache: Redis for frequently accessed data (1-5ms access)

  • L3 Cache: PostgreSQL for recent historical data (5-50ms access)

Query Optimization

  • Query plan caching for repeated patterns

  • Parallel query execution for multi-source aggregation

  • Batch processing for bulk data requests

  • Connection pooling for database efficiency

Security Architecture

Data Security

  • End-to-end encryption for data in transit (TLS 1.3)

  • Encryption at rest for sensitive data (AES-256)

  • Key rotation every 30 days

  • Hardware security module (HSM) for key management

Access Control

  • OAuth 2.0 for third-party integrations

  • Multi-factor authentication for admin access

  • IP whitelisting for production environments

  • Audit logging for all data access

Monitoring & Observability

Metrics Collection

Prometheus Metrics
├── System Metrics
│   ├── CPU/Memory utilization
│   ├── Network I/O
│   └── Disk usage
├── Application Metrics
│   ├── Request latency (p50, p95, p99)
│   ├── Throughput (req/s)
│   ├── Error rates
│   └── Cache hit rates
└── Business Metrics
    ├── Active API users
    ├── Data points processed/day
    └── Oracle update frequency

Distributed Tracing

  • OpenTelemetry integration for request tracing

  • Correlation IDs for cross-service tracking

  • Latency breakdown by component

  • Error propagation analysis

Scalability Considerations

Horizontal Scaling

  • Stateless service design for easy scaling

  • Auto-scaling based on CPU and memory metrics

  • Load balancing with consistent hashing

  • Database sharding by asset symbol

Vertical Scaling

  • Resource limits optimization based on profiling

  • JVM tuning for garbage collection efficiency

  • Database query optimization

  • Network stack tuning for high throughput

Future Architecture Enhancements

  1. Edge Computing: Deploy edge nodes closer to oracle sources

  2. Machine Learning Pipeline: Predictive analytics for oracle data

  3. Blockchain Integration: Direct on-chain data publishing

  4. Multi-chain Support: Extend beyond Solana to other networks

Last updated