Technical Architecture
System Architecture
Shards employs a multi-layered architecture designed for high performance, reliability, and scalability. The system is built on microservices principles with clear separation of concerns between data ingestion, processing, storage, and serving layers.
Core Components
1. Data Ingestion Layer
Oracle Connectors
The oracle connector subsystem maintains persistent connections to multiple oracle networks:
Oracle Connectors
├── Pyth Network Connector
│ ├── Price Feed Subscriber
│ ├── Confidence Interval Monitor
│ └── Update Frequency Tracker
├── Chainlink Connector
│ ├── Aggregator Contract Reader
│ ├── Round Data Processor
│ └── Heartbeat Monitor
└── Switchboard Connector
├── Feed Account Monitor
├── Aggregator Result Parser
└── Oracle Queue Manager
Technical Implementation:
WebSocket connections for real-time updates
Fallback to RPC polling for reliability
Automatic reconnection with exponential backoff
Data validation and sanitization at ingestion
RPC Interface
Direct integration with Solana RPC nodes:
interface RPCManager {
primary: SolanaRPCConnection;
fallback: SolanaRPCConnection[];
loadBalancer: RPCLoadBalancer;
methods: {
getAccountInfo(pubkey: PublicKey): Promise<AccountInfo>;
getProgramAccounts(programId: PublicKey): Promise<Account[]>;
subscribeToAccount(pubkey: PublicKey): EventEmitter;
}
}
2. Data Processing Pipeline
Normalization Engine
Transforms heterogeneous oracle data into standardized format:
interface NormalizedOracleData {
source: OracleProvider;
assetType: AssetClass;
symbol: string;
price: BigNumber;
confidence: BigNumber;
timestamp: number;
slot: number;
metadata: {
updateFrequency: number;
publishers: string[];
aggregationMethod: AggregationType;
}
}
Aggregation Service
Combines multiple oracle sources for enhanced reliability:
Weighted Average Calculation: Considers oracle reputation and update frequency
Outlier Detection: Statistical analysis to identify and exclude anomalous data
Confidence Scoring: Composite confidence metrics based on source agreement
3. Storage Layer
Time-Series Database
Optimized for high-throughput oracle data:
Storage Schema
├── Real-time Cache (Redis)
│ ├── Latest prices (TTL: 1s)
│ ├── Active subscriptions
│ └── Rate limiting counters
├── Hot Storage (PostgreSQL + TimescaleDB)
│ ├── 24-hour price history
│ ├── Aggregated minutely data
│ └── Active oracle metadata
└── Cold Storage (S3-compatible)
├── Historical data archives
├── Compressed hourly aggregates
└── Audit logs
Indexing Strategy
Primary index on (symbol, timestamp) for time-series queries
Secondary indices on oracle source and asset type
Bloom filters for efficient existence checks
Partitioning by time for optimal query performance
4. API Gateway
Request Router
Intelligent routing based on query patterns:
class RequestRouter {
route(request: APIRequest): Handler {
if (request.isRealtime()) {
return this.websocketHandler;
}
if (request.isHistorical()) {
return this.timeseriesHandler;
}
if (request.isAggregated()) {
return this.aggregationHandler;
}
return this.defaultHandler;
}
}
Rate Limiting & Authentication
Token bucket algorithm for rate limiting
JWT-based authentication with refresh tokens
API key management with usage tracking
Role-based access control (RBAC)
5. LLM Integration Layer
Semantic Indexing
Vector embeddings for natural language queries:
class SemanticIndexer:
def __init__(self):
self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
self.index = faiss.IndexFlatL2(384)
def index_oracle_metadata(self, metadata):
embeddings = self.encoder.encode(metadata.descriptions)
self.index.add(embeddings)
def search(self, query: str, k: int = 5):
query_embedding = self.encoder.encode([query])
distances, indices = self.index.search(query_embedding, k)
return self.retrieve_data(indices)
Query Translation
Natural language to structured query conversion:
NLP parsing for intent recognition
Entity extraction for asset identification
Temporal expression parsing for time ranges
Context preservation across conversation turns
Infrastructure
Deployment Architecture
Kubernetes Cluster
├── Ingestion Pods (3 replicas)
│ ├── Oracle Connectors
│ └── RPC Interfaces
├── Processing Pods (5 replicas)
│ ├── Normalization Engine
│ └── Aggregation Service
├── API Pods (10 replicas)
│ ├── REST API
│ ├── WebSocket Server
│ └── GraphQL Endpoint
└── Storage Pods
├── Redis Cluster (3 nodes)
├── PostgreSQL (Primary + 2 Replicas)
└── S3 Gateway
High Availability
Geographic Distribution: Multi-region deployment across 3+ zones
Automatic Failover: Health checks with automated pod restart
Circuit Breakers: Prevent cascade failures in oracle connections
Data Replication: Cross-region replication for disaster recovery
Performance Optimization
Caching Strategy
L1 Cache: In-memory cache for hot data (sub-millisecond access)
L2 Cache: Redis for frequently accessed data (1-5ms access)
L3 Cache: PostgreSQL for recent historical data (5-50ms access)
Query Optimization
Query plan caching for repeated patterns
Parallel query execution for multi-source aggregation
Batch processing for bulk data requests
Connection pooling for database efficiency
Security Architecture
Data Security
End-to-end encryption for data in transit (TLS 1.3)
Encryption at rest for sensitive data (AES-256)
Key rotation every 30 days
Hardware security module (HSM) for key management
Access Control
OAuth 2.0 for third-party integrations
Multi-factor authentication for admin access
IP whitelisting for production environments
Audit logging for all data access
Monitoring & Observability
Metrics Collection
Prometheus Metrics
├── System Metrics
│ ├── CPU/Memory utilization
│ ├── Network I/O
│ └── Disk usage
├── Application Metrics
│ ├── Request latency (p50, p95, p99)
│ ├── Throughput (req/s)
│ ├── Error rates
│ └── Cache hit rates
└── Business Metrics
├── Active API users
├── Data points processed/day
└── Oracle update frequency
Distributed Tracing
OpenTelemetry integration for request tracing
Correlation IDs for cross-service tracking
Latency breakdown by component
Error propagation analysis
Scalability Considerations
Horizontal Scaling
Stateless service design for easy scaling
Auto-scaling based on CPU and memory metrics
Load balancing with consistent hashing
Database sharding by asset symbol
Vertical Scaling
Resource limits optimization based on profiling
JVM tuning for garbage collection efficiency
Database query optimization
Network stack tuning for high throughput
Future Architecture Enhancements
Edge Computing: Deploy edge nodes closer to oracle sources
Machine Learning Pipeline: Predictive analytics for oracle data
Blockchain Integration: Direct on-chain data publishing
Multi-chain Support: Extend beyond Solana to other networks
Last updated