Alfresco Repository Clustering
Alfresco Clustering is necessary for high availability and also to allow your service architecture to build out horizontally.
In this blog we explore an Alfresco Repository Clustering configuration which is used in 95% of Alfresco clustered installations, offering high availability features while keeping architectural complexity and duplication of information low.
Simple Alfresco Repository Cluster Architecture
As shown above, the Simple Alfresco Repository Cluster consists of the following;
· A Load Balancer configured to use sticky sessions to load balance incoming requests between the Alfresco application servers. The Load balancer also provides an auto fail-over mechanism for whenever one node fails, the requests are sent to the next node in the cluster. In our case, Apache is used as a load balancer but IIS could also be used.
· Two alfresco nodes allocated for the content platform tier application server. The alfresco nodes can be increased as required hence providing a very scalable application architecture. Alfresco Enterprise 4.1.6 was installed on Ubuntu 12.04 for application tier.
· One shared database between the two nodes.
· One shared filesystem between the two nodes
· Local SOLR indexes kept for each node
Storage Tier
The Storage tier comprises of a database for metadata storage and a filesystem for content storage. For our content storage, setting up a samba shared drive proved to be an easy way to have an accessible filesystem for our alfresco nodes. For the alfresco database, MySQL was used.
Samba Shared Drive Set Up
Since we are using Ubuntu 12.04, the following commands may be Ubuntu specific but can be easily adapted for other linux flavours.
#sudo apt-get install samba (Use yum install samba for redhat)
#mv /etc/samba/smb.conf /etc/samba/smb.conf.template
#vim /etc/samba/smb.conf
[global]
; General server settings
; Normally a dns should be used but in our situation ip address seems the way to go.
netbios name = 10.0.0.19
server string =
workgroup = WORKGROUP
announce version = 5.0
socket options = TCP_NODELAY IPTOS_LOWDELAY SO_KEEPALIVE SO_RCVBUF=8192 SO_SNDBUF=8192
passdb backend = tdbsam
security = user
null passwords = true
username map = /etc/samba/smbusers
name resolve order = hosts wins bcast
wins support = yes
printing = CUPS
printcap name = CUPS
syslog = 1
syslog only = yes
[alfresco]
comment = Alfresco files
read only = no
guest ok = no
; Path can be any location on the server
path = /opt/alfrescoClusterData
Adding Users who can access shared
#smbpasswd -a username
put in password when prompted
#vim /etc/samba smbusers
<username> = “root”
<username> = “seed”
#service smbd restart
Note
These have to be correct user accounts that can log in the servers
Create Alfresco Cluster db
- CREATE DATABASE alfresco_cluster DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;
- GRANT ALL PRIVILEGES ON alfresco_cluster.* TO alfresco_cluster@’%’ IDENTIFIED BY ‘alfresco_cluster’;
- GRANT SELECT,LOCK TABLES ON alfresco_cluster.* TO alfresco_cluster@’%’ IDENTIFIED BY ‘alfresco_cluster’;
- FLUSH PRIVILEGES;
Note
· Make sure the following is commented in /etc/mysql/my.cnf
#bind-address = 127.0.0.1
· Instead of using ‘%’ in the above command, the ip addresses or dns can also be used.
Test DB and Shared Drive are accessible from the servers where alfresco would be set up
Test Database is accessible from alfresco servers.
#mysql -u alfresco_cluster -p -h 10.0.0.19
Mount Shared drive on both alfresco servers (Alfresco Node 1 and 2)
#mkdir /opt/alfrescoClusterData
#smbmount //10.0.0.19/alfresco /opt/alfrescoClusterData -o user=root password=seed
Note
In order for shared drive to mount at startup the following entry needs to be added in fstab
//10.0.0.19/alfresco /opt/alfrescoClusterData smbfs users,rw,username=”, password=”, dmask=777, fmask=777 0 0
Installing Alfresco
Installing Alfresco Node 1
Install alfresco in the usual manner after creating a test db and using the local alf_data folder for the content store.
After ensuring that you have a clean log and can login Alfresco Share,
Change the following in alfresco-global.properties
dir.root=/opt/alfrescoClusterData/alf_data (Points to the mapped samba drive on the local alfresco server)
db.username=alfresco_cluster
db.password=alfresco_cluster
db.name=alfresco_cluster
db.url=jdbc:mysql://10.0.0.19:3306/alfresco_cluster?useUnicode=yes&characterEncoding=UTF-8
dir.keystore=/opt/alfresco-4.1.6/alf_data/keystore (Update accordingly depending on where the keystore folder is located)
Set SOLR to rebuilt the indices
- Delete content of archive SpacesStore at alf_data/solr/archive/SpacesStore/*
- Delete content of workspace SpacesStore at alf_data/solr/workspace/SpacesStore/*
- Delete cached content model data at alf_data/solr/archive-SpacesStore/alfrescoModels/*
- Delete cached content model data at alf_data/solr/workspace-SpacesStore/alfrescoModels/*
Restart Alfresco
Installing Alfresco Node 2
After making sure Alfresco Node 1 started properly, repeat the same steps as above for Alfresco Node 2.
Alfresco Cluster Settings
Now that we have our two alfresco nodes connected to a single database and content store, we need to configure both the Alfresco servers to participate in a cluster.
Alfresco uses JGroups for multicast communication between servers. It allows sending the initial broadcast messages announcing a server’s availability. Additionally, JGroups manages the underlying communication channels, and cluster entry and exit. In order to initiate clustering, firstly, it is required to set the properties for the JGroups protocol so that it knows how to talk to the other Alfresco instance and secondly, configure the L2 cache. The level 2 or L2 cache provides out-of-transaction caching of Java objects inside the Alfresco system. Alfresco provides support for EHCache. Using EHCache does not restrict the Alfresco system to any particular application server, so it is completely portable.
In order to initiate Alfresco Clustering, the following changes are required;
Alfresco Node 1 and Node 2
· cp -r /opt/alfresco-4.1.6/tomcat/shared/classes/alfresco/extension/ehcache-custom.xml.sample.cluster /opt/alfresco-4.1.6/tomcat/shared/classes/alfresco/extension/ehcache-custom.xml
· Comment out the following in /opt/alfresco-4.1.6/tomcat/shared/classes/alfresco/extension/ehcache-custom.xml
<!– <cacheManagerPeerListenerFactory
class=”net.sf.ehcache.distribution.RMICacheManagerPeerListenerFactory”
properties=”socketTimeoutMillis=10000″
/> –>
· Uncomment the following in /opt/alfresco-4.1.6/tomcat/shared/classes/alfresco/extension/ehcache-custom.xml
<cacheManagerPeerListenerFactory
class=”net.sf.ehcache.distribution.RMICacheManagerPeerListenerFactory”
properties=”hostName=${alfresco.ehcache.rmi.hostname},
port=${alfresco.ehcache.rmi.port},
remoteObjectPort=${alfresco.ehcache.rmi.remoteObjectPort},
socketTimeoutMillis=${alfresco.ehcache.rmi.socketTimeoutMillis}” />
# Add the following in alfresco-global.properties for Alfresco Node 1
###Cluster Configs Alfresco Node 1
alfresco.cluster.name=AlfrescoCluster
alfresco.jgroups.defaultProtocol=TCP
alfresco.tcp.start_port=7800
#Add the list of alfresco nodes here-dns or ip
alfresco.tcp.initial_hosts=10.0.0.10[7800],10.0.0.119[7800]
# ip or dns of local alfresco server
alfresco.ehcache.rmi.hostname=10.0.0.10
# Should be same as alfresco.ehcache.rmi.hostname
alfresco.rmi.services.external.host=10.0.0.10
alfresco.ehcache.rmi.port=40001
alfresco.ehcache.rmi.remoteObjectPort=45001
# Add the following in alfresco-global.properties for Alfresco Node 2
###Cluster Configs Alfresco Node 2
alfresco.cluster.name=AlfrescoCluster
alfresco.jgroups.defaultProtocol=TCP
alfresco.tcp.start_port=7800
#Add the list of alfresco nodes here-dns or ip
alfresco.tcp.initial_hosts=10.0.0.10[7800],10.0.0.119[7800]
#ip or dns of local alfresco server
alfresco.ehcache.rmi.hostname=10.0.0.119
# Should be same as alfresco.ehcache.rmi.hostname
alfresco.rmi.services.external.host=10.0.0.119
alfresco.ehcache.rmi.port=40001
alfresco.ehcache.rmi.remoteObjectPort=45001
#Restart Alfresco and check the logs to see if the Cluster with the name AlfrescoCluster (or whatever alfresco.cluster.name has been set to) has been started.
Note
· Make sure the proper alfresco license is being used.
Testing the Alfresco clustering
1. Login Alfresco Node 1 as admin and create a folder named ClusterFolder1
2. Login Alfresco Node 2 as admin, check that the folder ClusterFolder1 can be viewed and create a folder named ClusterFolder2.
3. Login Alfresco Node 1 again as admin and check that the folder ClusterFolder2 can be viewed.
Load Balancer Set Up
Apache can be used to load balance incoming requests between the Alfresco application servers. Normally the load balancer should be set up in its own server but for the sake of convenience, it is set up on our storage server and the commands used below are Ubuntu specific hence needs to be modified accordingly for other Linux flavours.
#Install Apache using apt-get install apache2 (use yum install httpd for redhat)
Test by going to http://localhost
Apache connects to Alfresco on Tomcat via the mod_proxy_ajp module and the integrated software load balancer (mod_proxy_balancer) module.
mod_proxy_ajp and mod_proxy_balancer are installed by default when installing apache but need to be enabled.
#a2enmod proxy proxy_ajp proxy_balancer
For redhat add the following to httpd.conf
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_ajp_module modules/mod_proxy_ajp.so
LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
#mv /etc/apache/sites-available/default /etc/apache/sites-available/default-original
#create a new default file and add in the following;
For redhat add the following to httpd.conf
<VirtualHost *:80>
ProxyRequests off
ServerName 10.0.0.19 (Normally a proper DNS should be used)
DocumentRoot /var/www
<Directory />
Options FollowSymLinks
AllowOverride None
</Directory>
<Directory /var/www/>
Options Indexes FollowSymLinks MultiViews
AllowOverride None
Order allow,deny
allow from all
</Directory>
ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
<Directory “/usr/lib/cgi-bin”>
AllowOverride None
Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
Order allow,deny
Allow from all
</Directory>
ErrorLog ${APACHE_LOG_DIR}/error.log
# Possible values include: debug, info, notice, warn, error, crit,
# alert, emerg.
LogLevel warn
CustomLog ${APACHE_LOG_DIR}/access.log combined
<Proxy balancer://alfresco-cluster>
# alfresco node1
BalancerMember ajp://10.0.0.10:8009 min=10 max=100 route=node1 loadfactor=1
# alfresco node2
BalancerMember ajp://10.0.0.119:8009 min=20 max=200 route=node2 loadfactor=2
# Security “technically we aren’t blocking
# anyone but this the place to make those
# changes
Order Deny,Allow
Deny from none
Allow from all
# Load Balancer Settings
# We will be configuring a simple Round
# Robin style load balancer. This means
# that all alfresco nodes take an equal share of
# of the load.
#ProxySet lbmethod=byrequests
ProxySet stickysession=JSESSIONID
</Proxy>
# balancer-manager
# This tool is built into the mod_proxy_balancer
# module and will allow you to do some simple
# modifications to the balanced group via a gui
# web interface.
<Location /balancer-manager>
SetHandler balancer-manager
# I recommend locking this one down to your
# your office
Order deny,allow
Allow from all
</Location>
# Point of Balance
# This setting will allow to explicitly name the
# the location in the site that we want to be
# balanced, in this example we will balance “/”
# or everything in the site.
ProxyPass /balancer-manager !
ProxyPass /alfresco balancer://alfresco-cluster/alfresco
ProxyPass /share balancer://alfresco-cluster/share
</VirtualHost>
Tomcat adds the name of the Tomcat instance to the end of its session id cookie (i.e. JSESSIONID), separated with a dot (.) from the session id. Thus if the Apache web server finds a dot in the value of the sticky cookie, it only uses the part behind the dot to search for the route.
In order to let Tomcat know about its instance name, we need to set the attribute jvmRoute inside the Tomcat configuration file conf/server.xml to the value of the route of the BalancerMember that connects to the respective Tomcat
#vim /opt/alfresco-4.1.6/tomcat/conf/server.xml
Alfresco Node 1
<Engine name=”Catalina” defaultHost=”localhost” jvmRoute=”node1″>
Alfresco Node 2
<Engine name=”Catalina” defaultHost=”localhost” jvmRoute=”node2″>
Alfresco can now be accessed with the http://10.0.0.19/share URL and should redirect to one of the Alfresco Tomcat servers.
Note
Uncomment the following in tomcat/conf/server.xml and check the localhost_access.log file in tomcat/logs to see which alfresco node is being redirected to.
<Valve className=”org.apache.catalina.valves.AccessLogValve” directory=”logs”
prefix=”localhost_access_log.” suffix=”.txt” pattern=”common” resolveHosts=”false”/>
This concludes our blog on clustering the Alfresco application and we hope it gives a good point of start for venturing into alfresco clustering.
Reference
http://www.ixxus.com/blog/2012/01/getting-started-setting-alfresco-cluster
http://www.cignex.com/articles/alfresco-cluster-configuration