Skip to content
Horbit Lab LogoHORBIT
Back to Lab Notes
January 8, 2025
Horbit Labs Research Team
4 min read

Introducing DocFusionDB: Where Document Flexibility Meets SQL Performance

Exploring our experimental journey into high-performance document databases - combining PostgreSQL's JSONB storage with Apache Arrow's DataFusion query engine for lightning-fast analytics.

rustdatabaseexperimentalpostgresqldatafusionperformance

Introducing DocFusionDB: Where Document Flexibility Meets SQL Performance

At Horbit Labs, we believe the future of data storage lies at the intersection of flexibility and performance. Today, we're excited to share our latest experimental project: DocFusionDB - a high-performance document database that combines the schema flexibility of JSONB with the analytical power of Apache Arrow's DataFusion.

The Problem We're Solving

Modern applications generate increasingly complex, semi-structured data. Traditional relational databases struggle with evolving schemas, while document databases often sacrifice query performance for flexibility. We asked ourselves: What if we could have both?

Enter DocFusionDB - our answer to this fundamental challenge.

Architecture: The Best of Both Worlds

DocFusionDB isn't just another document database. It's a carefully engineered fusion of proven technologies:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   HTTP API      │    │   DataFusion    │    │   PostgreSQL    │
│   (Axum)        │───▶│   Query Engine  │───▶│   JSONB Storage │
│   + Query Cache │    │   + Custom UDFs │    │   + GIN Indexes │
└─────────────────┘    └─────────────────┘    └─────────────────┘

At its core, DocFusionDB leverages:

  • PostgreSQL's JSONB: Battle-tested document storage with sophisticated indexing
  • Apache Arrow DataFusion: Vectorized query execution for analytical workloads
  • Custom UDFs: Bridging SQL queries to JSONB operations seamlessly
  • Intelligent Caching: LRU-based query cache for sub-millisecond response times

Performance That Speaks Volumes

The numbers tell our story. Through zero-copy JSON processing and vectorized execution, DocFusionDB achieves:

  • Ultra-low latency: Sub-millisecond query responses from cache
  • High throughput: Bulk operations supporting up to 1,000 documents per request
  • Efficient indexing: PostgreSQL's GIN indexes accelerate complex JSON queries
  • Smart caching: Frequently accessed queries served instantly from memory

Built for Real-World Applications

DocFusionDB shines in scenarios where traditional databases struggle:

Content Management Systems

Store and query rich content with evolving structures - blog posts, product catalogs, user-generated content - all while maintaining lightning-fast search capabilities.

Analytics Platforms

Transform log data, user activities, and event streams into actionable insights using familiar SQL syntax over flexible JSON documents.

Application Backends

Support dynamic data structures that evolve with your product requirements, without the overhead of schema migrations.

The Rust Advantage

Building DocFusionDB in Rust wasn't just a technical choice - it was a strategic one. Rust's memory safety guarantees and zero-cost abstractions enable us to push performance boundaries while maintaining reliability. The rich ecosystem, from Axum's web framework to DataFusion's query engine, provides a solid foundation for high-performance data systems.

Features That Matter

Developer Experience

  • RESTful HTTP API: Intuitive endpoints for document operations and custom queries
  • CLI Interface: Streamlined tools for development, testing, and database operations
  • Flexible Configuration: YAML files, environment variables, or command-line arguments

Production Readiness

  • Connection Pooling: Efficient database connection management
  • Structured Logging: JSON-formatted logs with performance metrics
  • System Metrics: Built-in monitoring for performance insights
  • API Authentication: Optional security layer for production deployments

Data Operations

  • Backup & Restore: Essential data protection capabilities
  • Bulk Operations: Efficient batch processing for large datasets
  • Query Optimization: Custom UDFs optimize JSONB operations

A Word on Experimentation

DocFusionDB represents our commitment to pushing technological boundaries. While currently experimental, it demonstrates the potential of combining document flexibility with analytical performance. We're actively exploring advanced features like transactions and schema validation for future iterations.

The Road Ahead

This experiment teaches us valuable lessons about modern data architecture. The fusion of document storage and analytical engines opens new possibilities for application design. As we continue refining DocFusionDB, we're excited about its potential to influence how we think about data storage and retrieval.

Open Source and Open Minds

DocFusionDB is available on GitHub, embodying our belief in open innovation. We invite developers, researchers, and data enthusiasts to explore, experiment, and contribute to this journey.

The future of data systems lies not in choosing between flexibility and performance, but in thoughtfully combining them. DocFusionDB is our step toward that future.


Want to explore DocFusionDB? Check out the repository and join us in reimagining what's possible when document databases meet analytical engines.

Have thoughts on this experiment? We'd love to hear from you at lab@horbit.dev.

Continue the Experiment

Found this research interesting? Explore more insights from our laboratory or join the conversation in our community.