What is Object Storage?

What is Object Storage?

Object Storage is a highly advanced, massively scalable data storage architecture fundamentally designed to house the petabytes of unstructured and structured data that power the modern Open Data Lakehouse. Represented by platforms like Amazon S3, Google Cloud Storage, and Azure Blob Storage, Object Storage entirely abandons the complex, rigid file hierarchies (directories and folders) used in traditional computers, replacing them with a completely flat, infinitely scalable namespace.

In the era of Big Data, traditional File Storage systems failed catastrophically. When a system attempts to manage billions of files nested deep inside millions of complex folders, the mathematical overhead required simply to locate a file’s physical address on the hard drive becomes computationally paralyzing. Object Storage solves this limitation. It scales effortlessly from gigabytes to exabytes because it never wastes computational power managing directories.

The Architecture of an Object

In Object Storage, data is not saved as a “file” inside a “folder.” It is saved as a discrete, independent “Object.”

Every single Object consists of three inseparable components:

  1. The Data: The actual physical file (e.g., a massive Apache Parquet file, a 4K video, or a JSON log).
  2. The Metadata: A highly customizable, rich payload of descriptive text tightly bound to the data. Unlike traditional file systems (which only track creation date and file size), Object metadata can include custom tags like Department: Marketing, Security_Level: Confidential, or Sensor_ID: 9948.
  3. The Globally Unique Identifier (URI): A mathematically unique string (e.g., s3://data-lake-bucket/2026/sales_log_001.parquet).

To retrieve data, an analytical engine (like Dremio or Trino) does not “navigate” through folders. It simply presents the Unique Identifier to the storage API, and the Object Storage system returns the data instantaneously, whether the system contains ten objects or ten trillion objects.

The Foundation of the Data Lakehouse

Object Storage is the absolute physical foundation of the modern Data Lakehouse because it provides two critical capabilities:

  • Infinite Elasticity: An enterprise never has to “provision” a hard drive. They simply write data to the bucket. The cloud provider dynamically allocates space in the background. The organization pays exactly for what they use, allowing for limitless data hoarding.
  • Separation of Compute and Storage: Because Object Storage is accessed universally via HTTP REST APIs, it allows massive, distributed query engines to read the data without physically residing on the same servers.

Eventual Consistency vs. Strong Consistency

Historically, a massive limitation of Object Storage was Eventual Consistency. If a data engineer wrote a new Parquet file to S3 and immediately tried to read it, S3 might accidentally return a “File Not Found” error because the data had not fully propagated across the massive global server network. This made managing ACID transactions highly dangerous.

However, modern Object Storage systems have been radically upgraded to provide Strong Read-After-Write Consistency. When a file is written today, it is mathematically guaranteed to be immediately visible to all downstream analytical engines. This architectural upgrade is precisely what allowed Open Table Formats (like Apache Iceberg) to build perfectly safe, transactional Data Lakehouses directly on top of object storage.

Summary of Technical Value

Object Storage fundamentally solved the physical boundaries of data scale. By replacing fragile, nested file hierarchies with a flat, metadata-driven architecture accessed via standard web APIs, it provides the infinite, highly durable, and intensely cost-effective foundation required to store the massive datasets powering enterprise artificial intelligence and Data Lakehouse analytics.

Learn More

To learn more about the Data Lakehouse, read the book “Lakehouse for Everyone” by Alex Merced. You can find this and other books by Alex Merced at books.alexmerced.com.