Free your port 80 with HAProxy

One of the caveats of Comet, and more generally any Ajax-like interaction method for web applications, is the same-domain restrictions on Javascript initiated connections. Your Javascript client code is limited by the browser to only open connections to the domain that served the page it has been embedded or linked from.

There are workarounds depending on the connection method you choose to connect to the remote server. One of them is to dynamically generate <script> tags that invoke a local callback with remote data (I wrote an article about it.) Another alternative is executing Javascript inside an invisible <iframe> loaded from a sub domain of the main domain. This also allows data exchange from the <iframe> code and the main page code.

The user experience

The workaround methods are not exempt of flaws. The dynamic <script> tags loads make the message bar display a “Transferring…” message in some browsers, or a full-blown spinning logo animation in some others. The <iframe> method can cause the same problem, in addition to altering the browsing history (and Ajax applications already have enough problems with that.)

The best user experience comes from using XMLHTTPRequest. The browser doesn’t bother the user with loading messages or animations. Instead it exposes to the Javascript application a series of callbacks and status codes that allow the programmer to make data loading as subtle or as visible as desired. The problem is that the Javascript security model forbids XMLHTTPRequest from loading cross-domain resources. In the case of PubSub applications it is very common to have a separate server for serving the subscription connections, different from the one serving the main application.

Filtering proxy

The solution is to install a proxy with filtering capabilities in front of your servers. This is usually accomplished by a dedicated proxy server that is capable of parsing requests and choosing backends based on headers or paths. For this article I chose HAProxy. The other contender was nginx, which I had to discard since it lacks support for HTTP 1.1 features that are vital when dealing with mobile clients (namely support for chunked POST bodies). Both of these servers are excellent choices thanks to their single process, event-based programming that enables them to scale to many thousands of connections.

Use case

In this article I am going to allow my homebrew Twisted PubSub server that sits on notes.olivepeak.com:8080 to accept requests from notes.olivepeak.com:80. This host is already occupied by Apache, serving the Peak Notes application. Externally the PubSub server only deals with subscription requests sent to the URL path /subscriptions/channel/, and nothing else. Since both Twisted and Apache are going to sit behind HAProxy and never serve to the internet anymore, their listen addresses will be changed to 127.0.0.1:30200 and 127.0.0.1:30100 respectively.

Configuring HAProxy

With this information we are ready to let HAProxy take over the port 80 of our frontend server and write a config file for our setup. First we define the Apache backend:

backend apaches
mode http
timeout connect 10s
timeout server 30s
balance roundrobin
server apache1 127.0.0.1:30100 weight 1 maxconn 512

You can have as many server lines as you need, one for every server you want to balance inside the backend. I only have a simple VPS-like server so I only have one server entry, but you could define entire clusters of them, with complex load balancing rules.

The Twisted PubSub backend section:

backend pubsubs
mode http
timeout connect 5s
timeout server 5m
balance roundrobin
server twisted1 127.0.0.1:30200 weight 1 maxconn 10000

A much longer server timeout is required to support the XHR polling method (or any other HTTP polling method.) Also the maxconn parameter has been increased since the PubSub server is meant to be able to support a very large number of simultaneous connections.

The frontend section:

frontend http_proxy
bind 8.12.42.103:80
mode http
timeout client 5m
option forwardfor
default_backend apaches
acl req_pubsub_path path_beg /subscriptions/channel/
acl req_notes hdr_dom(host) -i notes.olivepeak.com
use_backend pubsubs if req_pubsub_path req_notes

First we bind the frontend to the public IP on port 80. The timeout client matches the 5 minutes we allowed for the PubSub backend. option forwardfor will include the X-Forwarded-For with the original IP in the proxied requests. The next 4 lines are the most interesting ones and I am going to explain them one by one. First we have the default backend:

default_backend apaches

This config line makes HAProxy use the apaches backed as the default target for incoming requests. If it’s impossible to match a request with any ACL rule, it will be proxied to apaches. Next we define an ACL expression:

acl req_pubsub_path path_beg /subscriptions/channel/

This tells HAProxy to assign the comparison path_beg /subscriptions/channel/ to the expression req_pubsub_path. path_beg is one of the many filtering operators supported by HAProxy and it compares the beginning of the request path with the given string. You can also compare headers:

acl req_notes hdr_dom(host) -i notes.olivepeak.com

In this case we are comparing the Host header with the operator hdr_dom(host) -i to the string notes.olivepeak.com. hdr_dom will search for a substring in the given header that matches the given string like if it was a domain name.

Finally the intelligent bit:

use_backend pubsubs if req_pubsub_path req_notes

With a single line in its config file we tell HAProxy to use the backed pubsubs when both the req_pubsub_path and the req_notes ACL expressions are true (a logical and operator is implied between them). From now on any request made to notes.olivepeak.com/subscriptions/channel/ will be relayed to the pubsubs backend server(s), and everything else will will be handled by the apaches backend.

Conclusion

No matter how small your hosting infrastructure is, or how modest your application traffic is, you can benefit from an advanced proxy frontend. It will be ready to grow with your needs and it will enable you to choose the best tool for the job. Choosing an application server will be much easier since you can run many of them at the same time on the same host.