The VAST DataEngine is a global function execution engine that consolidates data centers and cloud regions into one global computational framework
The VAST DataEngine brings data to life in a thinking machine that can continuously process and learn on data from the natural world.
No more batch. No more silos of data processing. Just continuous, recursive computing.
Shipping in 2024, the VAST DataEngine will redefine the data computing paradigm by introducing serverless functions and real-time triggers into the VAST Data Platform. Once logic and state are merged... files, objects and tables come to life from edge to cloud.
The VAST Data Platform breaks the tradeoff between data streaming and global insight by engineering data processing and event notifications natively into the system.
Simplifying AI Pipeline Management - Adding Functions and Triggers to Data
The VAST DataEngine provides the execution and orchestration intelligence to manage and execute the function pipelines that let data scientists and deep learning practitioners scrape, transform, train, infer, and otherwise derive value from the files/objects and tables the VAST Data Platform holds without worrying about where, how, or possibly when, those functions are executed.
The VAST DataEngine automatically optimizes these pipelines to minimize cost, execution time, and/or system utilization to deliver a serverless execution environment across multiple on-premise and cloud locations.
​
With the VAST DataEngine – data, and changes to data, trigger action, action is then performed on the data, and the system processes recursively forever, turning all of your datacenters and the public cloud resources you give it access to, into an integrated thinking machine that takes data in, and delivers valuable insights.
The Data Engine is the basis for perpetual AI training and inference for the AI-powered discoveries of the future.

One Global Computational Framework
Global Compute Execution and Orchestration Across On-Prem, Cloud and the Edge
The VAST DataEngine is built on a container framework that allows for services to be globally executed across the VAST DataSpace, consolidating data centers and cloud regions into one global computational framework.
​
Built on a High-Performance Global Namespace
Runs On-Prem and in the Cloud
Your HW, Your Data

A Programmable Computing Engine in Software
The DataEngine is a containerized computing environment that customers deploy on their choice of CPUs, GPUs and DPUs – from edge to cloud.
By embedding logic directly into the VAST Data Platform, the system can schedule processing events in real time, triggered by data activities.

DataEngine Programmable Environment via A Simple Python SDK
The VAST DataEngine is a serverless platform, programmed in Python, that integrates stateful functions into an exabyte-scale datastore and provides a programmable environment for developers.
By integrating streaming and data processing with an exabyte scale datastore and database, the Data Platform enables comprehensive function calling with minimal code.

Next-Generation Event Streaming Infrastructure
The VAST DataEngine features a new data streaming interface designed to write events natively into the VAST DataBase.
​
For the first time, it’s now possible to analyze all data by ingesting streaming data in realtime into VAST’s exabyte-scale transactional and analytical database.

A Real-Time Event Router
The VAST Event Router unifies unstructured and structured data event management into a common platform, providing event consumers simple tools to trigger action.

The VAST Data Platform is designed to create structure and insight from unstructured data
By storing triggers and functions as state in the VAST Data Platform, your code becomes dynamically managed by a global data store that supports global code versioning, global code distribution and global code security policies.

PDF - The VAST Data Platform White Paper
This comprehensive White Paper describes The VAST Data Platform in detail.
Pages 82 through 88 describes the VAST DataEngine in depth.
The VAST Data Platform is a breakthrough approach to data-intensive computing that serves as the comprehensive software infrastructure required to capture, catalog, refine, enrich, and preserve data through real-time deep data analysis and deep learning.
It is designed to provide seamless and universal data access and computing from edge-to-cloud, all from a platform that is designed for enterprises and cloud service providers to deploy on the infrastructure of their choosing.
A New AI Dataset
Introducing the VAST DataSet
Deep learning data engineering is tough. Data engineers write large dataset files down to archive storage for training… creating a number of problems associated with rigid data management:
-
If model training requires data variation, new datasets are written down to storage, often creating redundant data because datasets use overlapping training example data.
​
-
Because conventional datasets are not embedded with training code, it can often be difficult to reproduce training models as data and code continue to evolve independently.
With the DataEngine, VAST is introducing a new concept called the VAST DataSet. This new approach to data management leverages the VAST Database to create materialized views of example data without copying and re-copying data into blunt data containers.
DataSets can scale to exabytes. Each DataSet includes an indexed set of examples and the code used for training so that it’s easy to reproduce models on the fly.
