Introducing distributed data structures

The Fluid Framework provides developers with two types of shared objects: distributed data structures (DDSes) and Data Objects. Data Objects are beta and should not be used in production applications. DDSes are low-level data structures, while Data Objects are composed of DDSes and other shared objects. Data Objects are used to organize DDSes into semantically meaningful groupings for your scenario, as well as providing an API surface to your app's data. However, many Fluid applications will use only DDSes.

There are a number of shared objects built into the Fluid Framework. See Distributed data structures for more information.

DDSes automatically ensure that each client has access to the same state. They're called distributed data structures because they are similar to data structures used commonly when programming, like strings, maps/dictionaries, and objects, and arrays. The APIs provided by DDSes are designed to be familiar to programmers who've used these types of data structures before. For example, the SharedMap DDS is used to store key/value pairs, like a typical map or dictionary data structure, and provides get and set methods to store and retrieve data in the map.

When using a DDS, you can largely treat it as a local object. Your code can add data to it, remove data, update it, etc. However, a DDS is not just a local object. A DDS can also be changed by other users that are editing.

tip

Most distributed data structures are prefixed with "Shared" by convention. SharedMap, SharedTree, etc. This prefix indicates that the object is shared between multiple clients.

When a DDS is changed by any client, it raises an event locally. Your code can listen to these events so that the app knows when data is changed and can react appropriately. For example, your app may need to recalculate a derived value when some data in a DDS changes.

Merge behavior

In a distributed system like Fluid, it is critical to understand how changes from multiple clients are merged. Understanding the merge logic enables you to "preserve user intent" when users are collaborating on data. This means that the merge behavior should match what users intend or expect as they are editing data.

In Fluid, the merge behavior is defined by the DDS. The simplest merge strategy, employed by key-value distributed data structures like SharedMap, is last writer wins (LWW). With this merge strategy, when multiple clients write different values to the same key, the value that was written last will overwrite the others. Refer to the documentation for each DDS for more details about the merge strategy it uses.

Performance characteristics

Fluid DDSes exhibit different performance characteristics based on how they interact with the Fluid service. The DDSes generally fall into two broad categories: optimistic and consensus-based.

Optimistic data structures

Optimistic DDSes apply Fluid operations locally before they are sequenced by the Fluid service. The local changes are said to be applied optimistically in that they are applied before receiving confirmation from the Fluid service, hence the name optimistic DDSes.

The benefit to this approach is the user-perceived performance; operations made by the user are reflected immediately. The potential down-side to this approach is consistency; if another collaborator makes a concurrent edit that conflicts with, the DDS's merge resolution might end up changing the user's action after the fact.

The DDSes will apply remote operations as they are made, and will always arrive at a consistent state.

Many of the most commonly used DDSes are optimistic, including SharedMap, SharedString, and [SharedTree][].

Consensus-based data structures

Consensus-based DDSes are different from optimistic DDSes because they wait for confirmation from the Fluid service before applying operations -- even local operations. These data structures offer additional behavior guarantees and can be used when you need atomicity or synchronous behavior.

These behavioral guarantees cannot be implemented in an optimistic way. The cost is performance; optimistic DDSes are part of what makes Fluid so fast, so using optimistic DDSes is almost always preferred, but you can trade performance for behavioral guarantees.

An example of a consensus-based DDS in Fluid Framework is the TaskManager.

Why consensus-based DDSes are useful

To understand why consensus-based DDSes are useful, consider implementing a stack DDS. It's not possible (as far as we know!) to implement a stack DDS as an optimistic one. In the ops-based Fluid architecture, one would define an operation like pop, and when a client sees that operation in the op stream, it pops a value from its local stack object.

Imagine that client A pops, and client B also pops shortly after that, but before it sees client A's remote pop operation. With an optimistic DDS, the client will apply the local operation before the server even sees it. It doesn't wait. Thus, client A pops a value off the local stack, and client B pops the same value -- even though it was supposed to pop the second value. This represents divergent behavior; we expect a distributed stack to ensure that pop operations -- and any other operation for that matter -- are applied such that the clients reach a consistent state eventually. The optimistic implementation we just described violates that expectation.

A consensus-based DDS does not optimistically apply local ops. Instead, these DDSes wait for the server to apply a sequence number to the operation before applying it locally. With this approach, when two clients pop, neither makes any local changes until they get back a sequenced op from the server. Once they do, they apply the ops in order, which results in consistent behavior across all remote clients.

Storing a DDS within another DDS

Distributed data structures can store primitive values like numbers and strings, and JSON serializable objects. For objects that are not JSON-serializable, like DDSes, Fluid provides a mechanism called handles, which are serializable.

When storing a DDS within another DDS, your code must store its handle, not the DDS itself. For examples of how to do this, see Using handles to store and retrieve shared objects.

That's all you need to know about handles in order to use DDSes effectively. If you want to learn more about handles, see Fluid handles.

note

If you are considering storing a DDS within another DDS in order to give your app's data a hierarchical structure, consider using a SharedTree DDS instead.

Events

When a distributed data structure is changed by the Fluid runtime, it raises events. Your app can listen to these events so that the app knows when data is changed by remote clients and can react appropriately. For example, the app may need to recalculate a derived value when some data in a DDS changes.

myMap.on("valueChanged", () => {
	recalculate();
});

Picking the right data structure

Because distributed data structures can be stored within each other, you can combine DDSes to create collaborative data models. The following two questions can help determine the best data structures to use for a collaborative data model.

What is the granularity of collaboration that my scenario needs?
How does the merge behavior of a distributed data structure affect this?

In your scenario, what do users need to individually edit? For example, imagine that your app is a collaborative editing tool and it is storing data about geometric shapes. The app might store the coordinates of the shape, its length, width, etc.

When users edit this data, what pieces of the data can be edited simultaneously? This is an important question to answer because it influences how you structure the data in your DDSes.

Let's assume for a moment that all of the data about a shape is stored as a single object that looks like this:

{
	"x": 0,
	"y": 0,
	"height": 60,
	"width": 40
}

If we want to make this data collaborative using Fluid, the most direct -- but ultimately flawed -- approach is to store our shape object in a SharedMap. Our SharedMap would look something like this:

{
	"aShape": {
		"x": 0,
		"y": 0,
		"height": 60,
		"width": 40
	}
}

Recall that the SharedMap uses a last writer wins merge strategy. This means that if two users are editing the data at the same time, then the one who made the most recent change will overwrite the changes made by the other user.

Imagine that a user "A" is collaborating with a colleague, and the user changes the shape's width while the colleague "B" changes the shape's height. This will generate two operations: a set operation for user A's change, and another set operation for user B's change. Both operations will be sequenced by the Fluid service, but only one will 'win,' because the SharedMap's merge behavior is LWW. Because the shape is stored as an object, both set operations set the whole object.

This results in someone's changes being "lost" from a user's perspective. This may be perfectly fine for your needs. However, if your scenario requires users to edit individual properties of the shape, then the SharedMap LWW merge strategy probably won't give you the behavior you want.

However, you could address this problem in different ways depending on which version of Fluid Framework you are using.

In version 1.0, store individual shape properties in SharedMap keys. Instead of storing a JSON object with all the data, your code can break it apart and store the length in one SharedMap key, the width in another, etc. With this data model, users can change individual properties of the shape without overwriting other users' changes.

You likely have more than one shape in your data model, so you could create a SharedMap object to store all the shapes, then store the SharedMaps representing each shape within that parent SharedMap object.

In version 2.0, there's a better, way. Store a shape as an object node of a SharedTree. Your code can store the length in one property of the object node, the width in another, etc. Again, users can change individual properties of the shape without overwriting other users' changes.

When you have more than one shape in your data model, you could create a array node in the SharedTree, with child object nodes to store all the shapes.

Key-value data

These DDSes are used for storing key-value data. They are all optimistic and use a last-writer-wins merge policy.

SharedMap -- a basic key-value distributed data structure.
Map nodes in a [SharedTree][] -- a hierarchical data structure with three kinds of complex nodes; maps (similar to SharedMap), arrays, and JavaScript objects. There are also several kinds of leaf nodes, including boolean, string, number, null, and Fluid handles.

Array-like data

Array nodes in a [SharedTree][] -- a hierarchical data structure with three kinds of complex nodes; maps (similar to SharedMap), arrays, and JavaScript objects. There are also several kinds of leaf nodes, including boolean, string, number, null, and Fluid handles.

Object data

Object nodes in a [SharedTree][] -- a hierarchical data structure with three kinds of complex nodes; maps (similar to SharedMap), arrays, and JavaScript objects. There are also several kinds of leaf nodes, including boolean, string, number, null, and Fluid handles.

Specialized data structures

SharedString -- a specialized data structure for handling collaborative text.

Merge behavior​

Performance characteristics​

Optimistic data structures​

Consensus-based data structures​

Why consensus-based DDSes are useful​

Storing a DDS within another DDS​

Events​

Picking the right data structure​

Key-value data​

Array-like data​

Object data​

Specialized data structures​