OpenBSD firewall cluster with state table synchronization over WAN within a BGP environment

(originally postet 22.07.2018 on Google+)

Introduction

A company wants to connect its IPv4 PI (provider independent) space BGP multihomed to 2 independent ISPs (internet service provider), with stateful firewalls in each ISP’s data center. Goal is that a first protection already applies at ISP site, and that spoofed packets are already filtered there.

The challenge here are the firewalls, since it is hard to avoid asymmetric traffic in BGP environments, the firewalls need to be able to synchronize their state tables over WAN links.

Here is an overview of the setup:

The example company’s edge BGP routers r1-BGP and r2-BGP speak BGP to their counterparts ISP1-BGP and ISP2-BGP, announcing the company’s PI space 192.0.2.0/24 and receiving internet routes. The BGP sessions must be multihop, because on each site are firewalls between the BGP routers, fw-ISP1 and fw-ISP2. These firewalls are the interesting part in this configuration, because stateful inspection was required, and because asymmetric routing is expected to happen, they need to be able to synchronize their state tables over WAN links.

Implementation

The implementation consists of 2 parts, the BGP- and the firewall setup. The BGP setup is very basic, this is no complex BGP network, so I’ll only cover a few points you should consider. The firewall setup is the interesting part of this story, so I’ll go in more detail there.

BGP setup

IPv4 PI space?

The first question you should ask is “do I already have IPv4 PI space?” If not, you are likely in bad luck, because at least here in Europe, where RIPE (“Réseaux IP Européens” - don’t ask my why this is a French name, English the organisation’s language) is responsible for IP ressources, no more IPv4 PI addresses are given out.

Become LIR?

If you don’t already own IPv4 PI addresses, then your only option is to become a RIPE LIR (“local internet registry”). As of today, there is still IPv4 space available for new LIRs. It is bureaucratic paperwork to become an LIR, and there is a yearly fee. Check the RIPE website for details. In other regions of the world, instead of RIPE, you have to check the responsible RIR (“Regional Internet Registry”)

Get an AS number and Upstream Providers

The next you need is an AS (“Autonomous System”). If you had to become an LIR, than you simply request the AS along with the IPv4 addresses. If you use PI addresses, then you need to find a sponsoring LIR who requests the AS number for you. Most LIRs are ISPs -- and you anyway need to first make upstream agreements with at least 2 ISPs (a requirement for the AS number is that you will be multihomed).

Take care that both you and your upstream providers set correct RIPE database entries, like AUT-NUM (here the routing strategies must be documented) and AS-SET (downstream providers must be named here). Many ISPs out there in the internet check these database entries, and you’ll likely encounter unexpected behavior when they aren’t correct.

Fine-tune your routing strategy

You need to decide how to setup your upstreams. Use both (load sharing) or just one with the other as hot standby. Both approaches have advantages and disadvantages. With load sharing, you always actively test both upstreams (that’s an disadvantage of hot standby: imagine ISP1 is your active uplink, ISP2 hot standby; but due to a software or human error, ISP2 effectively blackholes you, but nobody notifies, because it is standby; and now ISP1 fails…). But when you do load sharing, it is very important that you carefully monitor the load of both uplinks. The sum of both loads must be lower than the capacity of your smallest uplink, because when the other uplink fails, and you have more traffic than the remaining one can carry, than you are in big trouble. I recommend that you upgrade your uplinks no later than the summed peaks become bigger than 75% of your smallest uplink.

Which devices?

Then you have to decide which routers to use. I prefer Cisco, but that’s just me. There are other good choices. Linux quagga or FRR, or OpenBSD bgpd may be good open source choices. Maybe you consider a 2-manufacturer approach, to avoid being killed by the same bug on both uplinks.

When you answered all these questions, the BGP configuration itself will be the easiest part. There are plenty of examples in the internet.

Firewall setup

In this chapter, I’ll start with the devices. What are the options?

CheckPoint firewalls are of course a choice. They can perform active/active state synchronisation. But they are quite pricey, maybe a bit overkill for just quite basic stateful firewalling. Quite the same may apply for other commercial firewalls.

My first idea was Linux iptables and conntrackd. At first glance, it looked good, but a test setup failed. States are not synchronized correctly when the packet flow is asymmetric. If I only had read the manual first:

Active-Active setup

The Active-Active setup consists of having more than one stateful firewall replicas actively filtering traffic. Thus, we reduce the resource waste that implies to have a backup firewall which does nothing.

We can classify the type of Active-Active setups in several families:

Symmetric path routing: The stateful firewall replicas share the workload in terms of flows, ie. the packets that are part of a flow are always filtered by the same firewall.

Asymmetric multi-path routing: The packets that are part of a flow can be filtered by whatever stateful firewall in the cluster. Thus, every flow-states have to be propagated to all the firewalls in the cluster as we do not know which one would be the next to filter a packet. This setup goes against the design of stateful firewalls as we define the filtering policy based on flows, not in packets anymore.

As for 0.9.8, the design of conntrackd allows you to deploy an symmetric Active-Active setup based on a static approach. For example, assume that you have two virtual IPs, vIP1 and vIP2, and two firewall replicas, FW1 and FW2. You can give the virtual vIP1 to the firewall FW1 and the vIP2 to the FW2.

Unfortunately, you will have to wait for the support for the Active-Active setup based on dynamic approach, ie. a workload sharing setup without directors that allow the stateful firewall share the filtering.

On the other hand, the asymmetric scenario may work if your setup fulfills several strong assumptions. However, in the opinion of the author of this work, the asymmetric setup goes against the design of stateful firewalls and conntrackd. Therefore, you have two choices here: you can deploy an Active-Backup setup or go back to your old stateless rule-set (in that case, the conntrack-tools will not be of any help anymore, of course).

So it doesn’t work with Linux iptables and conntrackd. And since the author seems to think that this is a feature, not a bug, I guess we can wait quite long until it will be implemented…

The next try was OpenBSD pf and pfsync. OpenBSD has a very good reputation as a base for secure and stable internet devices, but I hadn’t much experience with it, this is why it was only 2nd choice. And this blog post (from 2009!) sounded very encouraging. Tried it out in a test setup -- worked.

Ok, then let’s see how the actual configuration of the firewall at ISP1 looks like.

Setup the local IP addresses in /etc/hostname.<interface names> and the default gateway in /etc/mygate.

Enable routing:

sysctl net.inet.ip.forwarding=1

echo 'net.inet.ip.forwarding=1' >> /etc/sysctl.conf

Install the packages for OpenVPN (which is used for the table synchronization interface -- maybe there are other options, but I have very good experiences with OpenVPN):

pkg_add lzo2

pkg_add lz4

pkg_add openvpn

Edited /etc/rc.local. Maybe there are more recommended locations to start/configure these things, but I like it when configurations are on as little different places as possible:

/sbin/route add -inet 213.0.113.0/29 198.51.100.2

/sbin/route add -inet 192.0.2.0/24 198.51.100.2

/usr/local/sbin/openvpn --cd /usr/local/etc/openvpn --config pfsync.conf --daemon

sleep 1

/sbin/ifconfig pfsync0 syncdev tap0 defer up

Explanations: The first route is to ensure that the OpenVPN packets are routed through the internal infrastructure, not the internet (be sure to do corresponding routes on the other firewall and(!) the BGP routers r1-BGP and r2-BGP). The second route is for the PI space.

The next command ist to start the OpenVPN tunnel for the sync interface.

“sleep 1” is because I experienced a situation after a reboot of a firewall, that pfsync was not started. I guessed it was a race condition, after the 1 second grace period it didn’t happen again.

The last line starts pfsync. The “defer” option is very important (for a detailed explanation, check this already mentioned article). It withholds the transmission of an initial connection packet until the state table synchronization was acknowledged by the peer. This prevents possible packet drops in situations where a reply packet is faster than the state table synchronisation.

Here is the OpenVPN configuration /usr/local/etc/openvpn/pfsync.conf:

script-security 2

dev tap0

ifconfig 10.10.10.1 255.255.255.0

remote 203.0.113.1

tun-mtu 1500

tun-mtu-extra 64

fragment 1400

mssfix

comp-lzo

lport 5000

rport 5000

secret keys/pfsync.secret

Explanations: tap0 is important because pfsync expects an ethernet interface. tun-mtu, tun-mtu-extra and fragment are important to have the tap device the standard MTU of 1500 and let it handle OpenVPN with internal fragmentation when a packet plus the OpenVPN overhead exceeds the MTU. mssfix is only important for TCP connections (which pfsync isn’t), so this is not necessary here, but it doesn’t hurt (and is btw most useful in other situations).

The other parameters should be quite self-explaining.

And here is /etc/pf.conf -- of course I leave out most of the actual configuration with explicit rules, which you have to set depending on your needs. For explanations see the comments:

# table containing all IP addresses assigned to the firewall. Useful when referring the

# firewall itself in explicit rules.

table <firewall> const { self }

# Increase state table size. The increased TCP opening timeout was necessary

# because of a very slow scp server.

set limit states 200000

set timeout tcp.opening 90

# Implicit deny all.

block log all

# Allow outgoing connections from firewall itself

pass out on egress all keep state

# Allow sync interface

pass on tap0

pass in on tap0

# Allow icmp. I despise icmp blocking. :-)

pass proto icmp

# Allow all outgoing connections

pass in on bge0 keep state (sloppy)

The last line needs a detailed explanation. First, I had this rule without the sloppy. In normal mode, pf not only checks whether a packet belongs to a valid established connection, but in the case TCP also whether the sequence number fits into a reasonable range. And here I ran into a problem, web browsing often failed with stalled connections when the routing was asymmetric.

I could reproduce the problem with a simple “wget -r www.cisco.com”. It started, connection for connection (you remember, HTTP usually consists of many little connections), first without problems, but then stalled somewhere. With tcpdump I could verify that packets arrived on the external interface, but were silently dropped. No logs or whatever, so I assumed it is a bug.

I didn’t debug this any further, because with sloppy mode it worked, and sequence number check was not required in my setup. And I found in the manual that sloppy mode is anyway suggested for asymmetric environments:

sloppy

Uses a sloppy TCP connection tracker that does not check sequence numbers at all,

which makes insertion and ICMP teardown attacks way easier. This is intended to be

used in situations where one does not see all packets of a connection, e.g. in

asymmetric routing situations. It cannot be used with modulate state or synproxy state.

Rolf's Blog

Samstag, 5. Oktober 2019

OpenBSD firewall cluster with state table synchronization over WAN within a BGP environment

BGP setup

Keine Kommentare:

Kommentar veröffentlichen