Introducing distributed data structures

The Fluid Framework provides developers with two types of shared objects: distributed data structures (DDSes) and Data Objects. DDSes are low-level data structures, while Data Objects are composed of DDSes and other shared objects. Data Objects are used to organize DDSes into semantically meaningful groupings for your scenario, as well as providing an API surface to your app’s data. However, many Fluid applications will use only DDSes.

There are a number of shared objects built into the Fluid Framework. See Distributed data structures for more information.

DDSes automatically ensure that each client has access to the same state. They’re called distributed data structures because they are similar to data structures used commonly when programming, like strings, maps/dictionaries, and sequences/lists. The APIs provided by DDSes are designed to be familiar to programmers who’ve used these types of data structures before. For example, the SharedMap DDS is used to store key/value pairs, like a typical map or dictionary data structure, and provides get and set methods to store and retrieve data in the map.

When using a DDS, you can largely treat it as a local object. Your code can add data to it, remove data, update it, etc. However, a DDS is not just a local object. A DDS can also be changed by other users that are editing.

Tip

Most distributed data structures are prefixed with “Shared” by convention. SharedMap, SharedMatrix, SharedString, etc. This prefix indicates that the object is shared between multiple clients.

When a DDS is changed by any client, it raises an event locally. Your code can listen to these events so that the app knows when data is changed and can react appropriately. For example, your app may need to recalculate a derived value when some data in a DDS changes.

Merge behavior

In a distributed system like Fluid, it is critical to understand how changes from multiple clients are merged. Understanding the merge logic enables you to “preserve user intent” when users are collaborating on data. This means that the merge behavior should match what users intend or expect as they are editing data.

In Fluid, the merge behavior is defined by the DDS. The simplest merge strategy, employed by key-value distributed data structures like SharedMap, is last writer wins (LWW). With this merge strategy, when multiple clients write different values to the same key, the value that was written last will overwrite the others. Refer to the documentation for each DDS for more details about the merge strategy it uses.

Performance characteristics

Fluid DDSes exhibit different performance characteristics based on how they interact with the Fluid service. The DDSes generally fall into two broad categories: optimistic and consensus-based.

Optimistic data structures

Optimistic DDSes apply Fluid operations locally before they are sequenced by the Fluid service. The local changes are said to be applied optimistically, hence the name optimistic DDSes. The DDSes also apply remote operations as they are made in a consistent way.

Many of the most commonly used DDSes are optimistic, including SharedMap, SharedSequence, and SharedString.

Consensus-based data structures

Consensus-based DDSes are different from optimistic DDSes because they wait for confirmation from the Fluid service before applying operations – even local operations. These data structures offer additional behavior guarantees and can be used when you need atomicity or synchronous behavior.

These behavioral guarantees cannot be implemented in an optimistic way. The cost is performance; optimistic DDSes are part of what makes Fluid so fast, so using optimistic DDSes is almost always preferred, but you can trade performance for behavioral guarantees.

An example of a consensus-based DDS in Fluid Framework is the [TaskManager]][].

Why consensus-based DDSes are useful

To understand why consensus-based DDSes are useful, consider implementing a stack DDS. It’s not possible (as far as we know!) to implement a stack DDS as an optimistic one. In the ops-based Fluid architecture, one would define an operation like pop, and when a client sees that operation in the op stream, it pops a value from its local stack object.

Imagine that client A pops, and client B also pops shortly after that, but before it sees client A’s remote pop operation. With an optimistic DDS, the client will apply the local operation before the server even sees it. It doesn’t wait. Thus, client A pops a value off the local stack, and client B pops the same value – even though it was supposed to pop the second value. This represents divergent behavior; we expect a distributed stack to ensure that pop operations – and any other operation for that matter – are applied such that the clients reach a consistent state eventually. The optimistic implementation we just described violates that expectation.

A consensus-based DDS does not optimistically apply local ops. Instead, these DDSes wait for the server to apply a sequence number to the operation before applying it locally. With this approach, when two clients pop, neither makes any local changes until they get back a sequenced op from the server. Once they do, they apply the ops in order, which results in consistent behavior across all remote clients.

Storing a DDS within another DDS

Distributed data structures can store primitive values like numbers and strings, and JSON serializable objects. For objects that are not JSON-serializable, like DDSes, Fluid provides a mechanism called handles, which are serializable.

When storing a DDS within another DDS, your code must store its handle, not the DDS itself. For examples of how to do this, see [Using handles to store and retrieve shared objects][handles-example].

That’s all you need to know about handles in order to use DDSes effectively. If you want to learn more about handles, see Fluid handles. [handles-example]: /docs/build/data-modeling/#using-handles-to-store-and-retrieve-fluid-objects

Events

When a distributed data structure is changed by the Fluid runtime, it raises events. Your app can listen to these events so that the app knows when data is changed by remote clients and can react appropriately. For example, the app may need to recalculate a derived value when some data in a DDS changes.

myMap.on("valueChanged", () => {
  recalculate();
});

Refer to later sections for more details about the events raised by each DDS.

Picking the right data structure

Because distributed data structures can be stored within each other, you can combine DDSes to create collaborative data models. The following two questions can help determine the best data structures to use for a collaborative data model.

  • What is the granularity of collaboration that my scenario needs?
  • How does the merge behavior of a distributed data structure affect this?

In your scenario, what do users need to individually edit? For example, imagine that your app is a collaborative editing tool and it is storing data about geometric shapes. The app might store the coordinates of the shape, its length, width, etc.

When users edit this data, what pieces of the data can be edited simultaneously? This is an important question to answer because it influences how you structure the data in your DDSes.

Let’s assume for a moment that all of the data about a shape is stored as a single object that looks like this:

{
  "x": 0,
  "y": 0,
  "height": 60,
  "width": 40
}

If we want to make this data collaborative using Fluid, the most direct – but ultimately flawed – approach is to store our shape object in a SharedMap. Our SharedMap would look something like this:

{
  "aShape": {
    "x": 0,
    "y": 0,
    "height": 60,
    "width": 40
  }
}

Recall that the SharedMap uses a last writer wins merge strategy. This means that if two users are editing the data at the same time, then the one who made the most recent change will overwrite the changes made by the other user.

Imagine that a user “A” is collaborating with a colleague, and the user changes the shape’s width while the colleague “B” changes the shape’s height. This will generate two operations: a set operation for user A’s change, and another set operation for user B’s change. Both operations will be sequenced by the Fluid service, but only one will ‘win,’ because the SharedMap’s merge behavior is LWW. Because the shape is stored as an object, both set operations set the whole object.

This results in someone’s changes being “lost” from a user’s perspective. This may be perfectly fine for your needs. However, if your scenario requires users to edit individual properties of the shape, then the SharedMap LWW merge strategy probably won’t give you the behavior you want.

However, you could address this problem by storing individual shape properties in SharedMap keys. Instead of storing a JSON object with all the data, your code can break it apart and store the length in one SharedMap key, the width in another, etc. With this data model, users can change individual properties of the shape without overwriting other users' changes.

You likely have more than one shape in your data model, so you could create a SharedMap object to store all the shapes, then store the SharedMaps representing each shape within that parent SharedMap object.

Key-value data

These DDSes are used for storing key-value data. They are all optimistic and use a last-writer-wins merge policy.

  • SharedMap – a basic key-value distributed data structure.

Sequences

These DDSes are used for storing sequential data. They are all optimistic.

Specialized data structures