Technical Architecture

System Architecture

Shards employs a multi-layered architecture designed for high performance, reliability, and scalability. The system is built on microservices principles with clear separation of concerns between data ingestion, processing, storage, and serving layers.

Core Components

1. Data Ingestion Layer

Oracle Connectors

The oracle connector subsystem maintains persistent connections to multiple oracle networks:

Oracle Connectors
├── Pyth Network Connector
│   ├── Price Feed Subscriber
│   ├── Confidence Interval Monitor
│   └── Update Frequency Tracker
├── Chainlink Connector
│   ├── Aggregator Contract Reader
│   ├── Round Data Processor
│   └── Heartbeat Monitor
└── Switchboard Connector
    ├── Feed Account Monitor
    ├── Aggregator Result Parser
    └── Oracle Queue Manager

Technical Implementation:

WebSocket connections for real-time updates
Fallback to RPC polling for reliability
Automatic reconnection with exponential backoff
Data validation and sanitization at ingestion

RPC Interface

Direct integration with Solana RPC nodes:

interface RPCManager {
  primary: SolanaRPCConnection;
  fallback: SolanaRPCConnection[];
  loadBalancer: RPCLoadBalancer;
  
  methods: {
    getAccountInfo(pubkey: PublicKey): Promise<AccountInfo>;
    getProgramAccounts(programId: PublicKey): Promise<Account[]>;
    subscribeToAccount(pubkey: PublicKey): EventEmitter;
  }
}

2. Data Processing Pipeline

Normalization Engine

Transforms heterogeneous oracle data into standardized format:

interface NormalizedOracleData {
  source: OracleProvider;
  assetType: AssetClass;
  symbol: string;
  price: BigNumber;
  confidence: BigNumber;
  timestamp: number;
  slot: number;
  metadata: {
    updateFrequency: number;
    publishers: string[];
    aggregationMethod: AggregationType;
  }
}

Aggregation Service

Combines multiple oracle sources for enhanced reliability:

Weighted Average Calculation: Considers oracle reputation and update frequency
Outlier Detection: Statistical analysis to identify and exclude anomalous data
Confidence Scoring: Composite confidence metrics based on source agreement

3. Storage Layer

Time-Series Database

Optimized for high-throughput oracle data:

Storage Schema
├── Real-time Cache (Redis)
│   ├── Latest prices (TTL: 1s)
│   ├── Active subscriptions
│   └── Rate limiting counters
├── Hot Storage (PostgreSQL + TimescaleDB)
│   ├── 24-hour price history
│   ├── Aggregated minutely data
│   └── Active oracle metadata
└── Cold Storage (S3-compatible)
    ├── Historical data archives
    ├── Compressed hourly aggregates
    └── Audit logs

Indexing Strategy

Primary index on (symbol, timestamp) for time-series queries
Secondary indices on oracle source and asset type
Bloom filters for efficient existence checks
Partitioning by time for optimal query performance

4. API Gateway

Request Router

Intelligent routing based on query patterns:

class RequestRouter {
  route(request: APIRequest): Handler {
    if (request.isRealtime()) {
      return this.websocketHandler;
    }
    if (request.isHistorical()) {
      return this.timeseriesHandler;
    }
    if (request.isAggregated()) {
      return this.aggregationHandler;
    }
    return this.defaultHandler;
  }
}

Rate Limiting & Authentication

Token bucket algorithm for rate limiting
JWT-based authentication with refresh tokens
API key management with usage tracking
Role-based access control (RBAC)

5. LLM Integration Layer

Semantic Indexing

Vector embeddings for natural language queries:

class SemanticIndexer:
    def __init__(self):
        self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
        self.index = faiss.IndexFlatL2(384)
    
    def index_oracle_metadata(self, metadata):
        embeddings = self.encoder.encode(metadata.descriptions)
        self.index.add(embeddings)
    
    def search(self, query: str, k: int = 5):
        query_embedding = self.encoder.encode([query])
        distances, indices = self.index.search(query_embedding, k)
        return self.retrieve_data(indices)

Query Translation

Natural language to structured query conversion:

NLP parsing for intent recognition
Entity extraction for asset identification
Temporal expression parsing for time ranges
Context preservation across conversation turns

Infrastructure

Deployment Architecture

Kubernetes Cluster
├── Ingestion Pods (3 replicas)
│   ├── Oracle Connectors
│   └── RPC Interfaces
├── Processing Pods (5 replicas)
│   ├── Normalization Engine
│   └── Aggregation Service
├── API Pods (10 replicas)
│   ├── REST API
│   ├── WebSocket Server
│   └── GraphQL Endpoint
└── Storage Pods
    ├── Redis Cluster (3 nodes)
    ├── PostgreSQL (Primary + 2 Replicas)
    └── S3 Gateway

High Availability

Geographic Distribution: Multi-region deployment across 3+ zones
Automatic Failover: Health checks with automated pod restart
Circuit Breakers: Prevent cascade failures in oracle connections
Data Replication: Cross-region replication for disaster recovery

Performance Optimization

Caching Strategy

L1 Cache: In-memory cache for hot data (sub-millisecond access)
L2 Cache: Redis for frequently accessed data (1-5ms access)
L3 Cache: PostgreSQL for recent historical data (5-50ms access)

Query Optimization

Query plan caching for repeated patterns
Parallel query execution for multi-source aggregation
Batch processing for bulk data requests
Connection pooling for database efficiency

Security Architecture

Data Security

End-to-end encryption for data in transit (TLS 1.3)
Encryption at rest for sensitive data (AES-256)
Key rotation every 30 days
Hardware security module (HSM) for key management

Access Control

OAuth 2.0 for third-party integrations
Multi-factor authentication for admin access
IP whitelisting for production environments
Audit logging for all data access

Monitoring & Observability

Metrics Collection

Prometheus Metrics
├── System Metrics
│   ├── CPU/Memory utilization
│   ├── Network I/O
│   └── Disk usage
├── Application Metrics
│   ├── Request latency (p50, p95, p99)
│   ├── Throughput (req/s)
│   ├── Error rates
│   └── Cache hit rates
└── Business Metrics
    ├── Active API users
    ├── Data points processed/day
    └── Oracle update frequency

Distributed Tracing

OpenTelemetry integration for request tracing
Correlation IDs for cross-service tracking
Latency breakdown by component
Error propagation analysis

Scalability Considerations

Horizontal Scaling

Stateless service design for easy scaling
Auto-scaling based on CPU and memory metrics
Load balancing with consistent hashing
Database sharding by asset symbol

Vertical Scaling

Resource limits optimization based on profiling
JVM tuning for garbage collection efficiency
Database query optimization
Network stack tuning for high throughput

Future Architecture Enhancements

Edge Computing: Deploy edge nodes closer to oracle sources
Machine Learning Pipeline: Predictive analytics for oracle data
Blockchain Integration: Direct on-chain data publishing
Multi-chain Support: Extend beyond Solana to other networks

PreviousShards: Overview NextAPI Reference

Last updated 3 days ago