/ meta

Blogging from the ground up: part 1

For now, I'll keep the first few posts meta and focus on the technology behind this blog. That being said, I'm bootstrapping as I go along; as in, I'll be blogging as I set the blog up. So don't be surprised if things change along the way.

Overview

Technology stack

I've listed the main technologies along with a brief description, written from top to bottom:

  • Nginx - Proxies requests coming from port 80 to Ghost
  • Ghost - Open source blogging platform written in Node.js
    • SQLite - Ghost uses SQLite as its default database
  • pm2 - Daemon for running applications
  • Node.js - Server-side JavaScript platform
  • V8 - The actual JavaScript engine
  • Amazon Linux AMI (x64) - Linux distro based on Red Hat Enterprise Linux (RHEL)
  • Amazon EC2 t1.micro VM - A compute instance in... the cloud

Scaling up

If I ever make it to the big leagues of super-blogdom, then I'll have to worry about scaling my itty-bitty blog up. Not a bad problem to have.

There are a couple of ways to scale this architecture up depending on what resource is acting as a bottleneck. My blog isn't doing much behind the scenes, so I expect it would become network-IO bound. So rather than vertically scale the instance, I would horizontally scale the entire stack shown above to many instances.

Horizontally scaling at the instance level means that two new system components have to be introduced. One is a load balancing component to distribute and route incoming requests. The other component needed is a centralized database, as opposed to SQLite databases living locally on each instance.

Load balancing

The easy approach would be to use Amazon's Elastic Load Balancing service. However, for the sake of exploration I may roll my own load balancer running on a regular EC2 instance, using something like HAProxy.

Once an external load balancer is in place, the Nginx layer on each instance will still be useful if I have multiple applications/services running on an instance, or if I want to use it for SSL termination, network ACLs, etc.

Centralized data store

With many instances trying to read/write data concurrently, it is necessary to provide a logically centralized data store with some form of concurrency control. An easy win would be to switch from SQLite to either MySQL or PostgreSQL, since Bookshelf (the ORM for Ghost) supports both.

With the size of the dataset, a distributed, memory-based cache would most likely not be needed at this point.

AWS Configuration

You can get started for free by signing up for their free-usage tier. What this offers you, essentially, is the ability to run a micro instance for free (for 12 months); however, it does require you to enter in a credit card.

I won't be going through a step-by-step tutorial (there are plenty out there already). What I will be doing is describing my specific configuration and overall process of getting set up.

EC2 (the server)

Note: Once you've got EC2 set up, the rest of the AWS configuration is optional. It's only relevant if you're planning on setting up a public-facing application.

Micro instance

A micro instance has the following resources available:

  • 1 virtual core @ 2.00 GHz*
  • 0.615 GiB (~630 MB) of memory

For my purposes, this micro instance should last me a while. My application (blog) will be mostly network-bound, assuming I'll eventually get some traffic.

I have my instance set up using the default 64-bit Amazon Linux AMI, a custom distro based on RHEL.

* The micro instance offers bursting CPU capabilities, allowing it to have better performance for short periods of time; it can even exceed the t1.small in CPU utilization. However, if you hit prolonged high CPU utilization, Amazon will throttle you to snail levels. See the EC2 user guide for more information.

Elastic Block Store (mountable vdisks)

For my instance, I have two Elastic Block Stores (EBS) mounted. One is an 8GB Root volume, which contains everything except for my application data. The other EBS contains files relating to my blog and any other applications I may run on the instance. The reason for this separation is portability. It may be extraneous, but it provides orthogonality between the instance and the application. In a situation where I wanted to quickly move from one instance to another, this may come in handy. However, it's not something I would design deployments around.

The relevant lines of my fstab look like this:

# /etc/fstab
LABEL=/     /           ext4    defaults,noatime 1   1
/dev/xvdb   /www        ext4    defaults         0   2

Note: If you want to mount your EBS without restarting your instance, just use sudo mount -a.

Security group settings

I've opened the following inbound ports:

  • 22 - SSH
  • 80 - HTTP
  • 443 - HTTPS

Elastic IP

If you need a static, public IP for you server (e.g. for DNS), then you should associate an "Elastic IP" with your instance. This can be done from the EC2 Dashboard under the Network & Security section.

Allocating an Elastic IP is essentially reserving your right to use that particular IP address. By associating an IP with an instance, you are tying the IP to a particular instance. These Elastic IPs can be disassociated and reassociated at a whim; however, releasing an address means you lose it forever.

Route 53 (DNS)

Route 53 is Amazon's clever naming for their DNS service. For now, I've just set up the following types of records:

  • NS - Name Server records point to the authoritative name servers for your domain. These name servers are provided by Amazon automatically.
  • A - Address records associate an IPv4 address with your domain. I have set one up for superuser.do and www.superuser.do that point to the same Elastic IP.
  • SOA - Start of [a zone of] authority records point to the primary name server. In addition, these records contain information about the domain-name owner, the zone version, and zone refresh/expiration info. This is automatically set up by Amazon.

Once I get a mail service set up, I'll add MX records.

SSHing for the first time

For the initial instance setup, I'll be using the ec2-user, which is the default user for this particular AMI. Once the initial setup is done, I'll create a group for developers and add myself as a user. Because I'm the sole developer, it's not a big priority; however, applications with multiple developers need to have well-defined roles and ACLs.

I've placed my .pem private key in my ~/.ssh/ directory, and setup my SSH config with something akin to the following:

# ~/.ssh/config
Host blog
    HostName        superuser.do # or your elastic IP
    User            ec2-user
    IdentityFile    ~/.ssh/yourprivatekey.pem

SSHing is now as easy as:

$ ssh blog

Installing dependencies

Once you've SSH'd into your instance, it's time to set up dependencies.

Adding the EPEL6 yum repo

I'm choosing to install Node.js using the built-in package manager yum. To do so, I will need to add the Extra Packages for Enterprise Linux (EPEL) yum repository:

$ cd /tmp
$ wget http://mirror-fpt-telecom.fpt.net/fedora/epel/6/i386/epel-release-6-8.noarch.rpm
$ sudo yum install epel-release-6.8.noarh.rpm
$ sudo yum update

Install git

$ sudo yum install git

Installing Node.js

$ sudo yum install npm

The command above will also install node because it's a dependency of npm.

Installing Nginx

Now we're going to install nginx and make sure that it starts on boot.

$ sudo yum install nginx
$ sudo chkconfig nginx on  # this distro uses chkconfig instead of update-rc.d

Wrap-up

At this point, we've got our EC2 instance set up with a basic configuration. In addition, our domain name now points to the EC2 instance, and it will be accessible to the internet over HTTP(S) once the application is set up.

Parts 2, 3, and beyond

Part 2 of this series will cover setting up Node.js, Ghost, and pm2.

In Part 3, we'll start productionizing the blog by setting up Nginx as a proxy to Ghost.

At the time of this writing, this is as far as I've gotten in setting up my blog. Future articles in this series will most likely cover customizing Ghost, moving away from SQLite3, and whatever else may come up.