top of page
Abstract Linear Background

VAST DataBase

An Exabyte Scale Transactional and Analytical DataBase

The VAST DataBase has broken fundamental database tradeoffs to combine the transactional performance of a database, the query performance of an exabyte-scalable data warehouse at the cost of a data lake.

VAST has resolved the tradeoffs between transactions (to capture and catalog natural data in real-time) and deep analytics (to analyze and correlate data in real-time), applying structure to unstructured data at any scale.

 

The VAST DataBase is designed for rapid data capture and fast queries at any scale, breaking the barriers of real-time analytics from the event stream to the archive.

In addition, the VAST DataBase simultaneously addresses the challenges of streaming data and an analytical store.

 

With high concurrency and the low latency to handle the high volume and velocity of streaming data, the VAST DataBase analyzes events in real-time without impacting ingestion or other downstream consumers.

 

The VAST Catalog is a fully integrated and consistent index of all metadata, enabling users to run SQL queries to search and find data easily - in a fraction of the time that traditional methods would take.

0.png

Breaking the Tradeoffs Between Transactions and Queries

The VAST DataBase is structured to support high performance for both transactional updates and analytical queries at scale.

Scalable ACID Transactions

The VAST DataBase provides support for unlimited ACID transactions and atomic updates within and across tables in the system. Storage-Class Memory serves as a write buffer, enabling every ACID transaction to be stored instantaneously.

Columnar Queries

When newly written database rows are moved out of the Storage-Class Memory write buffer they are transfigured into small columnar objects (32KB), optimizing query performance.

A Columnar Data Format That Accelerates Queries

Systems that use Parquet make inefficient use of column store infrastructure.

At 32KB, VAST’s DataBase chunk size is 16,000x smaller than your average Parquet row group, enabling incredible levels of query filtration and reducing the number of records that query engines sift through, thus optimizing query performance.

Compressing Columnar Data

The VAST DataBase leverages VAST’s Similarity-Based Data Reduction, a next-generation approach to data reduction which compresses columnar chunks globally against each other, eliminating the data engineering hassle of sizing files in your data lake and delivering an All-Flash datastore at an Archive price point.

Integrated File System

VAST is the only DataBase to integrate with a parallel, POSIX file namespace and S3 namespace, enabling content to be merged with the context layer.

A Partition-Less Architecture

Each DataBase server sees the same volume of NVMe SSDs, making it possible to scale linearly without any diminishing returns.

Massive Scale

VAST clusters can be built to support well over an exabyte of data capacity. Today, several customers run clusters over 100PB in size.

The VAST Data Catalog - Merging Content and Context

VAST has created a built-in metadata index called the VAST Catalog. Now you can search and find data easily – in a fraction of the time that traditional methods would take.

The “loosely coupled” dual datastore model is a common pattern, a file/object store for content and a relational database for context (Metadata, indexes, Etc.).k to be held by the VAST cluster doing the transaction. With fine-grained de-centralized locks implemented with innovative read and write leasing mechanisms, the Dataspace delivers local write performance.

PDF – Blog Post - VAST Catalog: Treat Your File System Like a Database

In this blog post by Andy Pernsteiner, Field CTO at VAST Data, he discusses one of the many Use Cases for the VAST DataBase – the VAST Catalog, a fully integrated and consistent index of all metadata that enables users to run SQL queries across an exabyte-scale filesystem and object store.

 

Users can interface with the Catalog via a WebUI, a CLI or query engines including ApacheSpark and Trino.

Capture.PNG

PDF – Analyst Report – Revolutionizing AI Infrastructure

This in-depth report explores the VAST DataBase, a next-generation database within the VAST Data Platform optimized for transactional, analytical and real-time data workloads.

 

Traditional data warehouses and MPP databases hit networking and compute bottlenecks as they scale and cannot operate at modern scale.

 

Data lakes scale well but are inflexible and can’t be transacted without expensive workarounds and rewrites.

 

Data lakehouses bring flexibility to data lakes with transactions and time-travel but incur write amplification, adding fragility. The more the data lake concept is extended, the more it appears a complex patchwork attempting to conceal fundamental flaws.

 

VAST Data solves these AI-driven challenges with a globally distributed, consistent system with high-performance and low-cost access to files, objects, and tables supported with a data catalog and a semantic layer.

Capture.PNG

Breaking the Tradeoffs Between Event-Driven and Data-Driven Architectures

The VAST Data Platform accomplishes what has never been done before - a single platform that can handle the high ingestion rates and real-time analysis of streaming data and also allow for analytical queries across the entire dataset.

The VAST DataBase, as a component of the VAST Data Platform, addresses the challenges of streaming data:

  • High concurrency and low latency to handle the high volume and velocity of events.

  • Persists data immediately to a resilient store.

  • Analyzes events in real-time without impacting ingestion or other downstream consumers.

  • Scales both ingestion rates and storage capacity as needs increase.

The VAST DataBase simultaneously addresses the challenges of an analytical store:

  • Consistent performance for ad-hoc queries.

  • Scalable performance for deep analytical queries.

  • Scaling both capacity and performance. (independently as needs change)

  • Support for multiple ingest and access methods and applications.

  • Cost-effective at scale.

PDF – Blog Post – Blurring the Lines Between Event-Driven and Data Driven Architectures

Simplifying the design, deployment and maintenance of a combined streaming and analytical platform.

 

In this follow up blog post by Andy Pernsteiner, Field CTO at VAST Data, he discusses another Use Case for the VAST DataBase – fusing the dynamic nature of stream processing with the analytical depth of data-driven decision making.

 

Andy explains the unique attributes of the VAST DataBase and how it interweaves the realms of event-driven and data-driven architectures.

Capture.PNG

Linearly scale consistent database services across 1000s of CPUs

VAST’s new Disaggregated and Shared-Everything Architecture (DASE) is designed to break the conventional scaling limits of distributed systems.

 

The parallelism of the DASE architecture makes it possible to build a system that can transact in millions of records per second and query from an exabytescale volume of flash with near-infinite query performance.

 

Machines that run database logic are stateless and have been disaggregated from the flash where data is stored, making it possible for each CPU to write into the namespace without having to coordinate with other CPU’s, hence the term “Shared Everything”.

The VAST DataBase Explained | Build Beyond

An exabyte-scale database system enabling real-time data capture, cataloging, analyzing, and correlating, by combining the benefits of a database, data warehouse, and data lake.

PDF - The VAST Data Platform White Paper

This comprehensive White Paper describes The VAST Data Platform in detail.

 

Pages 74 through 81 describes the VAST DataBase in depth.

 

The VAST Data Platform is a breakthrough approach to data-intensive computing that serves as the comprehensive software infrastructure required to capture, catalog, refine, enrich, and preserve data through real-time deep data analysis and deep learning.

 

It is designed to provide seamless and universal data access and computing from edge-to-cloud, all from a platform that is designed for enterprises and cloud service providers to deploy on the infrastructure of their choosing.

content website.JPG

PDF – The VAST Data Platform – Short Overview

The Data Platform for The AI Era

 

Intelligent storage for all of your unstructured and structured data.

Automatic structuring of unstructured data upon ingestion using standard protocols, facilitating instant analysis of all data types.

 

Accelerate AI workloads on all of your data.

Optimized for AI, the VAST Data Platform supports all data and major protocols, ensuring AI workloads access data natively and meet the demands of GPU-intensive tasks without additional systems.

 

High performance ingestion and analysis of structured data.

Process millions of data rows per second, accelerating query speeds over 20X, and enabling fast, data-driven decisions by combining streaming with analytics.

Capture.PNG
bottom of page