Introducing UpOS

Upthere’s vision is to build a cloud platform to care for humankind’s information. This platform is based on an entirely new core technology, purpose-built to store, manage, share, and enrich lifetimes of data. We call this technology UpOS™.

When we set out to achieve this vision, we quickly realized that we’d have to rethink the conventional storage paradigms and build a new technology stack from the ground up. At every level — from the servers in our data center to the pixels in our applications — we designed, wrote, tested, and iterated repeatedly to create a powerful and compelling cloud platform.

The core technology

We believe that the time has come for the world to live off of the cloud on a day-to-day basis, not merely treat it as a secondary backup or sync location. This means, however, that writing to and reading from the cloud needs to be robust and fast enough to replace local storage. Rather than juggling multiple copies of a file between devices, our direct write technology keeps everything in the cloud, freeing the device to do what it does best — creating and consuming content. In order to overcome the technical challenges of this new model of computing, we knew we needed to own, optimize, and deeply integrate each component in our system — this is the primary reason we built our full technology stack.

CUSTOMIZED SERVER INFRASTRUCTURE

We started at the lowest level. We fabricated our own chassis, sourced hardware components tailored to our specific workload, and deployed the assembled servers in our own datacenter. We also designed and deployed a custom layout for our server racks in order to optimize cooling, ease of maintenance, networking speed, and power redundancy.

Each of our servers also runs our own flavor of Linux that’s customized to ensure data moves quickly throughout our system and reduce variability in performance.



RELIABILITY BY DESIGN

By building and integrating our entire technology stack, we gained a deep understanding of the behavior and possible failures that can occur. This has allowed us to add redundancy and appropriate failover policies at each integration point, minimizing the impact of failures on our customers.

At the infrastructure level, we’ve designed our system for maximum availability to ensure our users can always reliably access their content stored in Upthere. We are also constantly improving our storage redundancy mechanisms — like replication and erasure encoding — to prevent data loss and corruption. We have even built our own monitoring and analytics platform to provide real-time insights into our system’s performance and reliability.

All the code running on our servers is written with the mandate that losing customer data is unacceptable, even though we know things can and will go wrong. Our data model is based on idempotent user operations, so if servers fail or otherwise hit an irrecoverable error, we can replay operations in the cloud later to recover state changes that would otherwise be lost.

Finally, our client applications operate under the same paradigms of idempotent user operations. This enables our clients to safely retry and replay operations in order to update local state more efficiently and ensure user changes always make it to the cloud.

EFFICIENT AND FLEXIBLE STORAGE MODEL

We also focused on performance when designing the way we store files. When data enters the system, Upthere extracts and indexes metadata, storing it separately from the full payload. This allows us to optimize data retrieval for specific usage patterns. It also enables us to auto-organize content and provide a powerful search engine.

This design also benefits client devices, which can minimize data usage by only fetching and caching the specific metadata and previews needed for the task at hand.



SCALABLE PROCESSING PIPELINE

We decoupled our data-processing pipeline — which handles things like metadata extraction and indexing — from the actual data-ingestion pipeline. This allows us to manage the scale of our compute resources independently of our ingestion machines.

This decoupling provides a major operational benefit when scaling our backend infrastructure, and it also ensures that intensive data processing requirements will never inhibit our ability to reliably save and store new data coming into the system.



LAYERED SECURITY APPROACH

As a custodian of humankind’s information, we consider data security a top priority. Our vertical integration gives us deep insight into potential vulnerabilities and control over securing weaknesses. Each component of our stack is compartmentalized and designed with the assumption that any other could be vulnerable — the “Defense in Depth” strategy.

Upthere follows best practices like bcrypt hashed passwords and TLS, but at the end of the day, the weakest link in data security tends to be human behavior. To this end, we enforce stringent password requirements and offer two-factor authentication; going forward, we’ll continue to reinforce our efforts in addressing human vulnerabilities.

A modern, cloud-based file system

Storing data reliably and providing quick access are generally solved problems for local devices and single users. These problems, however, become a significant challenge at scale. When there are millions of users — each of them with multiple devices — trying to access, share, and modify their data, things get complicated.

A NEW APPROACH TO CONSISTENCY

To avoid data corruption, traditional file systems lock a file when someone is modifying it, but locks simply do not work in a large-scale distributed system. Instead, we designed an eventually-consistent data model that avoids having to lock files. Part of this data model is the use of operation logs. Instead of handling writes by sending updated objects back and forth, we send discrete user operations with update instructions. This allows us to perform safer and faster incremental updates that minimize the need for complex data merges or locks. As mentioned above, this architecture also allows us to replay operations and recover data in the event of a worst-case scenario server failure. These operation logs, combined with other invariants built into our stack, form a lock-free data model that will elegantly scale to handle humankind’s information.

A NEW LOCAL STORAGE MODEL

UpOS™ is built to support an ever-expanding amount of content being created in different formats from different devices and services. The Upthere native client framework provides a new abstraction over local file systems, transparently caching content so the client never has to know, or care, whether files are present locally or are being fetched from the cloud.

Traditional storage services treat consumer devices as independent storage locations, yielding a fragmented data experience. Instead of directly accessing the device- or OS-specific file system to store and cache data, client applications using the Upthere framework queue up mutation "tasks" to be performed at the optimal time. The framework acts as a write-through cache, asynchronously dispatching tasks to the server when the device has optimal connectivity and providing results to the client when available.

The Upthere framework makes it easy to fetch metadata, previews, and payloads independently, automatically taking care of network utilization and transparently caching data. Clients automatically benefit from speedy retrieval, efficient data usage, and optimized battery consumption with no additional work. This design combines the power of the cloud with the deeply integrated native APIs of today’s modern operating systems to enable seamless, cloud-based storage experiences.

A NEW KIND OF FILE

A new file system requires a new perspective on what a file is. Traditional file systems treat a file as a filename, an array of bytes containing data, and oftentimes a few bytes of metadata. A file may be copied into different folders, synced to different devices, or opened if the right application is installed — always requiring the full payload to be passed around.

Upthere, on the other hand, treats a file as a piece of rich data that can be consumed in a variety of ways appropriate to the device and use case at hand. Our data-processing pipeline handles a wide array of file types to extract appropriately-sized previews, transcoded video streams, and detailed, searchable metadata. The myriad rich metadata extracted from each different type of file allows for an extremely efficient consumption and sharing experience for our customers.

When a user wants to consume content, Upthere considers network speed and device form factor in order to optimize what resolutions are transferred to the local device. Client applications pull only what they need in order to provide the fastest, yet still comprehensive, user experience. This also enables customers to view almost any type of file from within one unified experience, Upthere Home, rather than having to find, download, and install specific applications on multiple devices.

Sharing between devices and people is faster and more efficient with UpOS™ than with traditional sync models. Instead of always downloading or transferring entire payloads between devices, UpOS™ simply distributes available metadata and lightweight previews to recipients. The full payload is only downloaded when the recipient needs it.

Unlike traditional folder hierarchies that require manual maintenance of nested organizational structures, UpOS™ is entirely query-based, allowing for complex query combinations to surface content in whatever way our customers’ minds work. This is also true of our Loops infrastructure, used for organization and sharing; rather than copying or moving payloads between folders, Loops can dynamically represent the same piece of content in different contexts defined by different queries.

Looking forward

A modern file system must look beyond traditional solutions to data storage and organization. With the amount of data increasing exponentially and devices becoming smaller and lasting fewer years, the sync and backup models treating local file systems as the truth are no longer tenable. This is why we built UpOS™ — a deeply-integrated technology stack that bridges the gap between the cloud and local operating systems.