1.2 Linking Process

When two applications that may contain related resources are integrated together, some process must be run to match resources from both applications and correlate them. This process is called the linking process.

The linking process will use a set of rules to determine matching pairs of resources. When a strong enough match is found (for example a match on emails for contacts), the process may decide to link the records automatically. But when the match is weaker or when it is ambiguous, a user will have to decide how the records should be linked. So, the linking process is usually a mix of automatic and manual operations.

If matching resources get synchronized after the link has been established, the linking process should also reconciliate differences between the data of the two resources so that synchronization starts from a consistent state.

If only one of the applications has data in its database (the other one is a fresh install), there is no need to run a linking process. In this case, the first synchronization pass will automatically establish the link for all the records that it creates in the freshly installed application. In all cases, the synchronization process should automatically establish links between the new resources that it detects on one side and the resources that it creates on the other side.

SData leaves many of the details of the linking process unspecified. The implementers are free to choose whatever matching algorithm suits them best (see Wikipedia record linkage article for an overview of popular methods). Also, they may choose to package the process as a separate application or embed it into one of the applications. Implementers may use metadata attributes such as sme:isUniqueKey to identify properties that define strong matches. Additional attributes may be introduced later in SData schemas to support matching if there is a consensus around a common matching algorithm.

On the other hand, SData imposes the following policy on the allocation of UUIDs: the UUID of a resource should remain unspecified until the resource is linked to at least one other resource.

When two resources A and B are linked with each other, we have three cases to consider:

If A and B have never been linked before, they do not have a UUID yet. In this case the linking process should allocate a new UUID and set it on both A and B.
If one of them (A for example) has already been linked before but the other one (B) has never been linked, then only the first one (A) has a UUID. In this case, the linking process will simply assign A’s UUID to B.
If both of them have already been linked before (for example A with some other resource X, and B with some other resource Y), then A and B already have UUIDs, and different ones. In this case, the linking process must pick a winner (A for example) and a loser (B) and must request that the loser replace its UUID by the winner’s UUID. This requires that the loser propagate its UUID change to all the resources to which it was linked before and that these resources adopt the new UUID. In the example, B would propagate its UUID change to Y and both B and Y would relinquish their UUID in favor of A’s. The SData synchronization protocol will propagate such UUID changes but to do so, it requires that resources be able to hold more than one UUID, at least on a temporary basis.

A provider participating in cross-application linking MUST adhere to the above linking policy. As such, the provider MUST be capable of re-assigning the UUID for a linked resource without impeding the underlying application’s referential integrity.