14. LDAP Sync Replication

The LDAP Sync replication engine, syncrepl for short, is a consumer-side replication engine that enables the consumer LDAP server to maintain a shadow copy of a DIT fragment. A syncrepl engine resides at the consumer-side as one of the slapd (8) threads. It creates and maintains a consumer replica by connecting to the replication provider to perform the initial DIT content load followed either by periodic content polling or by timely updates upon content changes.

Syncrepl uses the LDAP Content Synchronization (or LDAP Sync for short) protocol as the replica synchronization protocol.

Syncrepl provides a stateful replication which supports both the pull-based and the push-based synchronizations and does not mandate the use of the history store.

Syncrepl keeps track of the status of the replication content by maintaining and exchanging synchronization cookies. Because the syncrepl consumer and provider maintain their content status, the consumer can poll the provider content to perform incremental synchronization by asking the entries required to make the consumer replica up-to-date with the provider content. Syncrepl also enables convenient management of replicas by maintaining replica status. The consumer replica can be constructed from a consumer-side or a provider-side backup at any synchronization status. Syncrepl can automatically resynchronize the consumer replica up-to-date with the current provider content.

Syncrepl supports both the pull-based and the push-based synchronization. In its basic refreshOnly mode synchronization, the provider uses a pull-based synchronization where the consumer servers need not be tracked and no history information is maintained. The information required for the provider to process periodic polling requests is contained in the synchronization cookie of the request itself. To optimize the pull-based synchronization, syncrepl utilizes the present phase of the LDAP Sync protocol as well as its delete phase, instead of falling back on frequent full reloads. To further optimize the pull-based synchronization, the provider can maintain a per-scope session log as the history store. In its refreshAndPersist mode of synchronization, the provider uses a push-based synchronization. The provider keeps track of the consumer servers that have requested the persistent search and sends them necessary updates as the provider replication content gets modified.

With syncrepl, a consumer server can create a replica without changing provider's configurations and without restarting the provider server, if the consumer server has appropriate access privileges for the DIT fragment to be replicated. The consumer server can stop the replication also without the need for provider-side changes and restart.

Syncrepl supports both partial and sparse replications. The shadow DIT fragment is defined by a general search criteria consisting of base, scope, filter, and attribute list. The replica content is also subject to the access privileges of the bind identity of the syncrepl replication connection.

14.1. The LDAP Content Synchronization Protocol

The LDAP Sync protocol allows a client to maintain a synchronized copy of a DIT fragment. The LDAP Sync operation is defined as a set of controls and other protocol elements which extend the LDAP search operation. This section introduces the LDAP Content Sync protocol only briefly. For more information, refer to the Internet Draft The LDAP Content Synchronization Operation <draft-zeilenga-ldup-sync-05.txt>.

The LDAP Sync protocol supports both polling and listening for changes by defining two respective synchronization operations: refreshOnly and refreshAndPersist. The polling is implemented by the refreshOnly operation. The client copy is synchronized to the server copy at the time of polling. The server finishes the search operation by returning SearchResultDone at the end of the search operation as in the normal search. The listening is implemented by the refreshAndPersist operation. Instead of finishing the search after returning all entries currently matching the search criteria, the synchronization search remains persistent in the server. Subsequent updates to the synchronization content in the server have additional entry updates be sent to the client.

The refreshOnly operation and the refresh stage of the refreshAndPersist operation can be performed by a present phase or a delete phase.

In the present phase, the server sends the client the entries updated within the search scope since the last synchronization. The server sends all requested attributes, be it changed or not, of the updated entries. For each unchanged entry which remains in the scope, the server sends a present message consisting only of the name of the entry and the synchronization control representing state present. The present message does not contain any attributes of the entry. After the client receives all update and present entries, it can reliably determine the new client copy by adding the entries added to the server, by replacing the entries modified at the server, and by deleting entries in the client copy which have not been updated nor specified as being present at the server.

The transmission of the updated entries in the delete phase is the same as in the present phase. The server sends all the requested attributes of the entries updated within the search scope since the last synchronization to the client. In the delete phase, however, the server sends a delete message for each entry deleted from the search scope, instead of sending present messages. The delete message consists only of the name of the entry and the synchronization control representing state delete. The new client copy can be determined by adding, modifying, and removing entries according to the synchronization control attached to the SearchResultEntry message.

In the case that the LDAP Sync server maintains a history store and can determine which entries are scoped out of the client copy since the last synchronization time, the server can use the delete phase. If the server does not maintain any history store, cannot determine the scoped-out entries from the history store, or the history store does not cover the outdated synchronization state of the client, the server should use the present phase. The use of the present phase is much more efficient than a full content reload in terms of the synchronization traffic. To reduce the synchronization traffic further, the LDAP Sync protocol also provides several optimizations such as the transmission of the normalized entryUUIDs and the transmission of the multiple entryUUIDs in a single syncIdSet message.

At the end of the refreshOnly synchronization, the server sends a synchronization cookie to the client as a state indicator of the client copy after the synchronization is completed. The client will present the received cookie when it requests the next incremental synchronization to the server.

When refreshAndPersist synchronization is used, the server sends a synchronization cookie at the end of the refresh stage by sending a Sync Info message with TRUE refreshDone. It also sends a synchronization cookie by attaching it to SearchResultEntry generated in the persist stage of the synchronization search. During the persist stage, the server can also send a Sync Info message containing the synchronization cookie at any time the server wants to update the client-side state indicator. The server also updates a synchronization indicator of the client at the end of the persist stage.

In the LDAP Sync protocol, entries are uniquely identified by the entryUUID attribute value. It can function as a reliable identifier of the entry. The DN of the entry, on the other hand, can be changed over time and hence cannot be considered as the reliable identifier. The entryUUID is attached to each SearchResultEntry or SearchResultReference as a part of the synchronization control.

14.2. Syncrepl Details

The syncrepl engine utilizes both the refreshOnly and the refreshAndPersist operations of the LDAP Sync protocol. If a syncrepl specification is included in a database definition, slapd (8) launches a syncrepl engine as a slapd (8) thread and schedules its execution. If the refreshOnly operation is specified, the syncrepl engine will be rescheduled at the interval time after a synchronization operation is completed. If the refreshAndPersist operation is specified, the engine will remain active and process the persistent synchronization messages from the provider.

The syncrepl engine utilizes both the present phase and the delete phase of the refresh synchronization. It is possible to configure a per-scope session log in the provider server which stores the entryUUIDs and the names of a finite number of entries deleted from a replication content. Multiple replicas of single provider content share the same per-scope session log. The syncrepl engine uses the delete phase if the session log is present and the state of the consumer server is recent enough that no session log entries are truncated after the last synchronization of the client. The syncrepl engine uses the present phase if no session log is configured for the replication content or if the consumer replica is too outdated to be covered by the session log. The current design of the session log store is memory based, so the information contained in the session log is not persistent over multiple provider invocations. It is not currently supported to access the session log store by using LDAP operations. It is also not currently supported to impose access control to the session log.

As a further optimization, even in the case the synchronization search is not associated with any session log, no entries will be transmitted to the consumer server when there has been no update in the replication context.

While slapd (8) can function as the LDAP Sync provider only when it is configured with either back-bdb or back-hdb backend, the syncrepl engine, which is a consumer-side replication engine, can work with any backends.

The LDAP Sync provider maintains contextCSN for each database as the current synchronization state indicator of the provider content. It is the largest entryCSN in the provider context such that no transactions for an entry having smaller entryCSN value remains outstanding. contextCSN could not just be set to the largest issued entryCSN because entryCSN is obtained before a transaction starts and transactions are not committed in the issue order.

The provider stores the contextCSN of a context in the syncreplCookie attribute of the immediate child entry of the context suffix whose DN is cn=ldapsync,<suffix> and object class is syncProviderSubentry.

The consumer stores its replica state, which is the provider's contextCSN received as a synchronization cookie, in the syncreplCookie attribute of the immediate child of the context suffix whose DN is cn=syncrepl<rid>,<suffix> and object class is syncConsumerSubentry. The replica state maintained by a consumer server is used as the synchronization state indicator when it performs subsequent incremental synchronization with the provider server. It is also used as a provider-side synchronization state indicator when it functions as a secondary provider server in a cascading replication configuration. <rid> is the replica ID uniquely identifying the replica locally in the syncrepl consumer server. <rid> is an integer which has no more than three decimal digits.

Because a general search filter can be used in the syncrepl specification, not all entries in the context will be returned as the synchronization content. The syncrepl engine creates a glue entry to fill in the holes in the replica context if any part of the replica content is subordinate to the holes. The glue entries will not be returned as the search result unless ManageDsaIT control is provided.

It is possible to retrieve syncProviderSubentry and syncConsumerSubentry by performing an LDAP search with the respective entries as the base object and with the base scope.

14.3. Configuring Syncrepl

Because syncrepl is a consumer-side replication engine, the syncrepl specification is defined in slapd.conf (5) of the consumer server, not in the provider server's configuration file. The initial loading of the replica content can be performed either by starting the syncrepl engine with no synchronization cookie or by populating the consumer replica by adding and demoting an LDIF file dumped as a backup at the provider. slapadd (8) supports the replica promotion and demotion.

When loading from a backup, it is not required to perform the initial loading from the up-to-date backup of the provider content. The syncrepl engine will automatically synchronize the initial consumer replica to the current provider content. As a result, it is not required to stop the provider server in order to avoid the replica inconsistency caused by the updates to the provider content during the content backup and loading process.

When replicating a large scale directory, especially in a bandwidth constrained environment, it is advised to load the consumer replica from a backup instead of performing a full initial load using syncrepl.

14.3.1. Set up the provider slapd

There is no special slapd.conf (5) directive for the provider syncrepl server except for the session log directive. Because the LDAP Sync search is subject to access control, proper access control privileges should be set up for the replicated content.

When creating a provider database from the LDIF file using slapadd (8), contextCSN and the syncProviderSubentry entry must be created. slapadd -p -w will create a new contextCSN from the entryCSNs of the added entries. It is also possible to create the syncProviderSubentry with an appropriate contextCSN value by directly including it in the ldif file. slapadd -p will preserve the provider's contextCSN or will change it to the consumer's contextCSN if it is to promote a replica to the provider's content. The syncProviderSubentry can be included in the ldif output when slapcat (8) is given the -m flag; the syncConsumerSubentry can be retrieved by the -k flag of slapcat (8).

The session log is configured by

        sessionlog <sid> <limit>

directive, where <sid> is the ID of the per-scope session log in the provider server and <limit> is the maximum number of session log entries the session log store can record. <sid> is an integer no longer than 3 decimal digits. If the consumer server sends a synchronization cookie containing sid=<sid> where <sid> matches the session log ID specified in the directive, the LDAP Sync search is to utilize the session log store.

14.3.2. Set up the consumer slapd

The syncrepl replication is specified in the database section of slapd.conf (5) for the replica context. The syncrepl engine is backend independent and the directive can be defined with any database type.

        syncrepl rid=123
                provider=ldap://provider.example.com:389
                type=refreshOnly
                interval=01:00:00:00
                searchbase="dc=example,dc=com"
                filter="(objectClass=organizationalPerson)"
                scope=sub
                attrs="cn,sn,ou,telephoneNumber,title,l"
                schemachecking=off
                updatedn="cn=replica,dc=example,dc=com"
                bindmethod=simple
                binddn="cn=syncuser,dc=example,dc=com"
                credentials=secret

In this example, the consumer will connect to the provider slapd at port 389 of ldap://provider.example.com to perform a polling (refreshOnly) mode of synchronization once a day. It will bind as cn=syncuser,dc=example,dc=com using simple authentication with password "secret". Note that the access control privilege of cn=syncuser,dc=example,dc=com should be set appropriately in the provider to retrieve the desired replication content. The consumer will write to its database with the privilege of the cn=replica,dc=example,dc=com entry as specified in the updatedn= directive. The updatedn entry should have write permission to the replica content.

The synchronization search in the above example will search for the entries whose objectClass is organizationalPerson in the entire subtree rooted at dc=example,dc=com. The requested attributes are cn, sn, ou, telephoneNumber, title, and l. The schema checking is turned off, so that the consumer slapd (8) will not enforce entry schema checking when it process updates from the provider slapd (8).

For more detailed information on the syncrepl directive, see the syncrepl section of The slapd Configuration File chapter of this admin guide.

14.3.3. Start the provider and the consumer slapd

The provider slapd (8) is not required to be restarted. contextCSN is automatically generated as needed: it might originally contained in the LDIF file, generated by slapadd (8), generated upon changes in the context, or generated when the first LDAP Sync search arrived at the provider.

When starting a consumer slapd (8), it is possible to provide a synchronization cookie as the -c cookie command line option in order to start the synchronization from a specific state. The cookie is a comma separated list of name=value pairs. Currently supported syncrepl cookie fields are csn=<csn>, sid=<sid>, and rid=<rid>. <csn> represents the current synchronization state of the consumer replica. <sid> is the identity of the per-scope session log to which this consumer will be associated. <rid> identifies a consumer replica locally within the consumer server. It is used to relate the cookie to the syncrepl definition in slapd.conf (5) which has the matching replica identifier. Both <sid> and <rid> have no more than 3 decimal digits. The command line cookie overrides the synchronization cookie stored in the consumer replica database.