Atlassian Crowd LDAP problems and downtimes?… see this post

After the upgrading of Crowd, our servers experimented a new problem… sometimes Crowd becomes “stuck” (also JIRA & Confluence because the SSO) and the log of Crowd show errors in the LDAP connection… but not always… two times a week.. etc.

Examples of errors in the log:

Caused by: com.atlassian.crowd.exception.OperationFailedException: org.springframework.ldap.PartialResultException: nested exception is javax.naming.PartialResultException [Root exception is javax.naming.CommunicationException: xxxx.com:xxx [Root exception is java.net.ConnectException: Connection refused]]
at com.atlassian.crowd.directory.SpringLDAPConnector.pageSearchResults(SpringLDAPConnector.java:441)

…

ERROR [atlassian.crowd.directory.DbCachingDirectoryPoller] Error occurred while refreshing the cache for directory [ xxxx].
com.atlassian.crowd.model.group.Membership$MembershipIterationException: com.atlassian.crowd.exception.OperationFailedException: org.springframework.ldap.PartialResultException: nested exception is javax.naming.PartialResultException [Root exception is javax.naming.CommunicationException: xxxx:xx [Root exception is java.net.ConnectException: Connection refused]]

…

Caused by: org.springframework.ldap.PartialResultException: nested exception is javax.naming.PartialResultException [Root exception is javax.naming.CommunicationException: xxx:xx [Root exception is java.net.ConnectException: Connection refused]]
at org.springframework.ldap.support.LdapUtils.convertLdapException(LdapUtils.java:216)
at org.springframework.ldap.core.LdapTemplate.search(LdapTemplate.java:385)

….

Caused by: javax.naming.PartialResultException [Root exception is javax.naming.CommunicationException: xxxx:xx [Root exception is java.net.ConnectException: Connection refused]]
at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMoreImpl(AbstractLdapNamingEnumeration.java:237)

…

SQL Error: 1062, SQLState: 23000

…

ERROR [engine.jdbc.spi.SqlExceptionHelper] Duplicate entry ’06T1QBOOMdAa0dQxxxxXXX’ for key ‘uk_token_id_hash’

…

Where is the problem? Why it happens only two times a week?….

After talk with the Atlassian support team, we got the SOLUTION…

DISABLE THE REFERRALS IN LDAP and GROW UP THE LDAP TIMEOUTS…

Very easy… example of Database queries to solve the problem:

update crowddb.cwd_directory_attribute SET attribute_value= ‘false’ where attribute_name=’ldap.referral’;
update crowddb.cwd_directory_attribute SET attribute_value= ‘240000’ where attribute_name=’ldap.read.timeout’;
update crowddb.cwd_directory_attribute SET attribute_value= ‘120000’ where attribute_name=’ldap.connection.timeout’;

The explanation is the current Active Directory configuration and how it resolved the DNS… but it is solved with this trick! Easy and effective! IMPORTANT: The trick is not recommended to Active Directories with multiple domains