[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#8309) Replication master/master with delta sync freeze after reboot



Full_Name: Florian Suhard
Version: 2.4.40
OS: Redhat 6.5
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (37.157.230.211)


Hello,

My name is Florian Suhard, i'm a french engineer. I am currently working on a
project using openldap replication.

Architecture of the project is as follows.

We have several Redhat 6.5 servers (4 on our test environments).
Each server has its own openldap directory on version 2.4.40.
One of this server is the hub (connected to all other).
Other servers are nodes (only connected to the hub).
Replication master/master with delta sync has been configured on each servers.
Nodes replicates only with the hub, Hub replicate with all nodes.

What we observe it's sometimes (i'm sorry it's really random) when we reboot the
hub, the slapd service 'freeze' on the hub and on others nodes.
Service slapd seems start (service slapd status => OK) but not respo w when we
try to connect with JXplorer and whith php function ldap_connect.
When we restart slapd on the hub (service slapd restart) everything returns to
normal.
I have the feeling that it's more easily to reproduce the bug by doing change on
ldap directory before reboot the hub.

We tried to correct by upgrade on openldap version 2.4.42. Redhat not provide
2.4.42 version so we get source rpm of Redhat 2.4.40 and rebuild with source of
openldap 2.4.42.
Bug still reproduced after the upgrade.

Our replication configuration can be found on your ftp
ftp://ftp.openldap.org/incoming/ in archive Florian-Suhard-20151112.tar
This archive is composed of 4 folders for the 4 servers configurations 
	slapd_opex-ctm.test.etiam.com.tar => one node
	slapd_hia1-ctm.test.etiam.com.tar => one node
	slapd_hia2-ctm.test.etiam.com.tar => one node
	slapd_central-ctm.test.etiam.com.tar => hub

I tried using gdb to detect the error. I see this trace :
Loaded symbols for /usr/bib64/libnsspem.so
0x00007faa47d2022d in pthread_join (threadid=140368938260224, thread_return=0x0)
at pthread_join.c:89
89	    lll_wait_tid (pd->tid);

I'm at your disposal for any further informations.

Thanks