Scaling XenForo on Digital Ocean’s IaaS

   Other
Scaling XenForo on Digital Ocean’s IaaS

Introduction

I have been running sports message board sites since the late 1990s.  My current site, www.sportstwo.com, became the official message board of the Portland Trail Blazers NBA team about a year ago.  We had been running vBulletin 4 since 2009, and it was getting dated.  The site needed a facelift and software upgrade to bring it up to modern standards and to work with modern browsers and mobile devices.

The site had been running on a server I designed and built, and was hosted in a half cabinet at a tier 1 co-location facility.  There were other servers in the cabinet used for staging, backup, etc.  The main server was designed to run pretty much forever.  It had dual power supplies, RAID 5 hardware SCSI disk controller with battery backup, Raptor 15,000 RPM disk drives, 16G of RAM, and dual quad core Intel Xenon processors.  

The server ran non-stop for 10 years with only a handful of reboots during that time.  As I write this, the uptime on the server;

$ uptime 15:07:58 up 1802 days, 22:09,  2 users,  load average: 0.00, 0.00, 0.00

That’s almost 5 years since it was last rebooted.  The only reason it was rebooted was the co-location facility where it was hosted was acquired by a different company and they required us to renumber the IP addresses of the servers . We rebooted them to assure they would reboot and come online in the case of some catastrophic power failure.  Even with the dual power supplies and the facility’s battery and generator backup power, a power failure could still happen.  

I kept vBulletin up to date with all the security releases up to a point.  The site has a lot of customized styles/skins/themes: a green one for the Boston Celtics, a red one for the Chicago Bulls, and so on.  The last security upgrade I applied performed some automatic merging of the upgrade’s styles with our custom ones and that caused a lot of grief fixing all the errors that were caused.  

The operating system on the server was Ubuntu 8.04 (Hardy).  It hadn’t been updated or patched in years, or the server would have had to be rebooted each kernel upgrade.  Hardy was a LTS version of Ubuntu, but it is so old the long term support expired and it is no longer supported.  Even if I wanted to do a software upgrade, the apt repositories no longer exist and there are no new security patches going to be available.

The monolith machine approach worked (and worked and worked) quite well when you consider the site was available 24/7/365 x 5+ years.  From an administration and maintenance standpoint, both vBulletin and not wanting to reboot.  

My initial thinking was to move the site to a cloud hosting company and figure out the upgrade to the latest vBulletin 4 release or maybe upgrade to vBulletin 5.  In either scenario, I might have to redo the styles from scratch if necessary, but I decided it is certainly worth it.  Going to the latest vBulletin 4 only made sense if it added a lot of new features so the site would feel new.

I read about both vBulletin 4 and vBulletin 5 and decided I really didn’t want to use either.  VBulletin was originally sold by a company named Jelsoft, and the product was the cadillac of community WWW site software for years.  Then Jelsoft was acquired by Internet Brands and the quality of the software and commitment to excellence seemed to decline.  vBulletin 4 created a lot of anger within the community of people who bought and used it.  The broad theme of posts on their own message board was that the software was released far before it was ready.  To this day a lot of vBulletin based sites I visit are still running vBulletin 3.  The reviews of vBulletin 5 are equally negative enough that I didn’t want to spend the money to try it and see for myself.

It turns out that the guys who wrote the original vBulletin and worked at Jelsoft started a new company and have been selling XenForo.  The reviews I read of it have been so positive, that I bought it and tried it out and really liked it.  

I set up a “preview” site on a virtual server at Digital Ocean and allowed my community members to try it out and provide any feedback.  It was almost unanimous that they liked XenForo.  They asked a lot of questions about features and I was confident I could modify the software to add any features they wanted.  They didn’t like the default skin, but I assured them that I would fix all that, too.  The Blazers organization was highly supportive of the idea of moving to XenForo, so all systems were “go” for the move.

Digital Ocean – Infrastructure as a Service

I chose Digital Ocean (DO) for the hosting for a few reasons.  Their pricing is excellent for the various VPS (Virtual Private Server) configurations they offer.  The price and performance is much better than Amazon’s AWS, which was my other consideration.  Digital Ocean lacks many of the services that AWS provides, like MySQL as a service, elastic IP that can be pointed at any instance, elastic load balancer, etc.  

All you get with Digital Ocean are VPS instances.  If you want a load balancer, you have to set it up using software such as Apache or Nginx on a VPS.  Given the price and performance of their various VPS configurations, setting up my own services seemed straightforward.  Digital Ocean even has “howto” documentation on their WWW site to get you going.

One thing that was certain is that the idea of a monolithic VPS to run the site did not make sense if I wanted 24/7/365 reliability.  A VPS is going to run on a physical server with multiple other VPS instances.  If the staff at DO decides to reboot the physical server, your instance gets rebooted as well.  And they do reboot the servers to upgrade their Ubuntu (or whatever operating system they run) and apply security patches (remember Heartbleed?).

The Infrastructure

I set out to deploy a reasonably priced assortment of VPS instances that would have the best chance of providing 24/7/365 availability.  What I came up with is 2x VPS for load balancers, 3x VPS for WWW servers, 2x VPS for glusterfs (a distributed NFS-like file server), and 2x VPS for MySQL.  The idea is to have at least 2 of everything so I can log into any one of them and reboot it to do an OS upgrade without the site missing a beat.  The only risk is if any of those pairs of instances are on the same physical server and that goes down for whatever reason.

image00

The load balancers are themselves load balanced using round-robin DNS.  That is accomplished with two “A” DNS records for the www.sportstwo.com domain name, one pointing at each of the load balancer instance’s IP addresses.  I chose Nginx as the load balancer as it is trivial to configure and very fast and it scales to a lot of simultaneous users.  It also allowed me to use inexpensive VPS instances, since a single CPU core would suffice, and the reverse proxy function is not memory or CPU intensive.

I chose 3x WWW server instances instead of 2x because of math.  If I chose 2x, and one is being rebooted, then a single VPS has to handle 2x the work, and the site speed will be cut in half.  The WWW server is running the PHP code for XenForo, which is CPU intensive by its nature.  With 3x WWW servers, if one is being rebooted, the other two take up the load so each is handling 1.5x their normal load.

In addition to running Apache+PHP, the WWW servers also run memcached.  XenForo is trivial to configure to use memcached for a speed boost and the VPS instances had enough RAM and CPU to support it.   I initially planned 2x memcached servers, but this saved a little bit of the monthly cost and added a 3rd server for even better redundancy and performance.

I spent more time looking at how to make MySQL highly available than anything else in this whole setup and configuration.  I ended up choosing to use two much bigger VPS instances than any of the others for MySQL.  I set up master-master replication between the two.  The problem with master-master is that there are race conditions like when doing writes to both at the same time that use an autoincrement field.  So load balancing between the two did not make sense  Instead, I set up haproxy on the 3x WWW servers to use the primary database server and if that server is unavailable to switch over to using the backup.  I really like this setup because I can do mysqldump for backups against the backup server without affecting the site – normally mysqldump locks the tables so queries from the WWW servers would block until the backup is finished.

XenForo stores avatars and attachments in the filesystem in a data/ directory and an internal_data directory.  This means that to scale it so it runs on more than one WWW server, these two directories either need to be synchronized between the multiple WWW servers or they need to be shared on something like an NFS mount  NFS itself does not trivially provide the ability for redundant servers as a cluster or otherwise provide redundancy and high availability.  I chose glusterfs because it acts like NFS, but can be deployed as a cluster.  For my purposes, two VPS instances is enough.

Security

In the USA, DO only provides private networking in their NYC data center.  Private networking is a second virtual ethernet with IP addresses on the 10.*.*.* reserved (for private networks) block.  If you set up apache to listen on a VPS’ private IP address, only other VPS in the data center can access Apache via port 80.  This is good for security, as hackers and bots can’t port scan the private ports.  What DO does not do is provide you with your own private network.  ALL servers in the data center on the 10.*.*.* network can access your VPS server ports.  

I chose the NYC data center for security reasons.  I set up every service on all these VPS servers so they listen only on the private network interface.  The only exceptions are port 80 on the two load balancers, and SSH.  Beware that SSH is a port that hackers can probe and try to exploit.  More on how I solved this issue in a bit.

I set this all up and configured all the software in a couple of days.  I installed a second preview version of XenForo on it and tested the redundant features.  I’d stop apache on 1 and 2 of the WWW servers in all permutations and the site was available in my browser.  I shut off one, then the other load balancer and it still worked.  I shut down one then the other MySQL server and it still worked.  I stopped one then the other glusterfs server and uploaded avatars and created attachments and the software worked as expected.

After it was all working and tested, I spun up a $5/month micro VPS and installed monit on it.  This program periodically does a health check against the services on the other VPS instances.  It can tell if HTTP is not responding on port 80 on any of the WWW servers, if MySQL isn’t answering on the database servers, if the load balancers aren’t responding to HTTP, etc.  Monit sends me emails when any of the services does not respond so I can get in and diagnose what’s going wrong.

The final step in setting up the infrastructure was configuring ip tables firewalls on all the VPS’.  By allowing access to the various services’ ports on the private network only to the VPS’ that should have access, a port scanner running from anywhere on the internet including someone else’s VPS in the datacenter will get a “connection refused” error trying to connect.  I used UFW to manage the firewall rules and created a shell script on each server to install the rules.  I can edit the script and run it to set the new rules into place.  I tested that all the firewall rules were proper using my own port scanner from a VPS in the data center, from my workstation, and from the old site’s server and verified that it was proper.

I spun up a gateway VPS on a different provider and set up UFW firewall rules to only allow port 22 (SSH) access from that VPS.  So truly the only ports open to the public internet on any of the servers are port 80 on the two load balancers.  I spin down the gateway VPS so it’s port 22 isn’t vulnerable to port scanners 24/7.  I only spin it up when I want remote access to any of the other VPS instances.

I ran Apache Benchmark (ab) against the whole thing, running XenForo with the full SportsTwo database, and saw 0 error responses and more than enough performance to have a fast site.  I loaded various pages and saw that none took more than 1.5 seconds.  Under 3 seconds is a good goal, and under 1 second is google.com kind of speed.  Most of the pages did load in under 1 second!

The Migration

There is one more part to the story.  I had to migrate a vBulletin 4 database scheme to a XenForo 1.4 database scheme, and I wanted to minimize the downtime for the site.  XenForo comes with a vBulletin 4 import tool that is web based.  I determined right away that I wanted to run this tool on my local home network because it was going to run for some number of hours and any kind of internet outage would interrupt it.  Plus the latency and my computers here perform better than the VPS ones.

I have an older laptop with SSD and a core i7 920 CPU that has been very fast.  I ran the import and it took about 18 hours.  The vBulletin 4 database has 3.5 million records in the posts table – that’s a lot of insert queries and InnoDB is notoriously slow at inserts.  I made every tweak to MySQL on my local machines to speed up the inserts as much as possible.  While the import was in progress, I noticed the laptop’s CPU cores weren’t busy much at all, so the likely bottleneck was the SATA 1 SSD.

I created an Ubuntu VM with VirtualBox on my iMac, which has a much faster SATA 2 SSD, and did the import there and it took 12 hours.  Then I created a Parallels VM and did the import there and it took 4 hours.  All this practice not only let me figure out the fastest way to import the database, but also how little time I would be able to tell the community on SportsTwo that the site would be down for the import.  

SportsTwo had to go down because I needed to start with a snapshot of the database to import.  If the site were allowed to run after that snapshot, any new posts, uploads, profile or preference changes by the users would not make it over to the new site.  It also made the process go a lot faster.  During the time the site was down, I changed the DNS to point to the new setup.  I put up an “under construction” page at the new site so while the database was being imported, no PHP errors would show up for anyone hitting the site.

Six hours after I shut down the old site, the new site was up and running.  I spent only a day tweaking  the skins for the most popular forums and smoothing out a small number of rough edges that showed up when hundreds of people were on the site plus another 40+ search engine spiders.   That work was a bit of JavaScript, CSS, PHP, and image manipulation – the kind of work we do at Modus Create for our clients.

Final Thoughts

I do have the PHP code for XenForo (the whole site, actually) in a git repo (not github!).  I am trivially able to run a staging version of the site in a VM on my workstation and work on the code, implement add ons, and so on.  Then I push the changes and pull them on the three WWW servers.  It’s worked great.

I did all of this over two 4 day weekends – Christmas and New Year.  The infrastructure was set up by the end of the first weekend.  The site was moved to DO on New Year’s Day starting at 7:30PM and was done at about 11:30PM.

References


Like What You See?

Got any questions?