Exploring Docker Networking – Host, None, and MACVLAN

You are probably wondering, “well this all sounds amazing but have you ever heard of VLANs?”  Yes, yes I have.  And yes, I know you very likely use them in your network today.  Depending on how your Docker hosts exist and the span of your network, I would guess it is very unlikely that you want everything on a single flat network and further that you probably lack the interfaces to support an interface for every network.

Enter MACVLAN. Note that this network actually doesn’t exist as a default network and it must be created by you in order to use it.

I’m starting off from scratch again, no containers running, a basic setup.

[root@dockernet2 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 inet 127.0.0.1/8 scope host lo
 valid_lft forever preferred_lft forever
 inet6 ::1/128 scope host
 valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
 link/ether 00:50:56:82:e0:b4 brd ff:ff:ff:ff:ff:ff
 inet 10.0.0.206/24 brd 10.0.0.255 scope global ens192
 valid_lft forever preferred_lft forever
 inet6 fe80::250:56ff:fe82:e0b4/64 scope link
 valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
 link/ether 02:42:c1:57:d2:6d brd ff:ff:ff:ff:ff:ff
 inet 172.17.0.1/16 scope global docker0
 valid_lft forever preferred_lft forever

You can see the 3 normal interfaces here that we’ve gotten familiar with during our network exploration.

Let’s go ahead and create a MACVLAN network and check it out:

[root@dockernet2 ~]# docker network create --driver macvlan --subnet 172.10.100.0/24 --gateway 172.10.100.1 -o parent=ens192.100 macvlan100
6a3a69fb53147348c897251321a757306e84ad2fcfd16d1cea70c7bfdac7e767
[root@dockernet2 ~]# docker network ls
NETWORK ID   NAME       DRIVER  SCOPE
b31f10c1052e bridge     bridge  local
f2f1cf53c9f2 host       host    local
6a3a69fb5314 macvlan100 macvlan local
b0cf5ba8b44d none       null    local

OK so what did I do?  I created a new network using the macvlan driver, designated the subnet, the gateway, the parent interface where containers will connect, and finally named it macvlan100.

VLANs on Linux use the parent/sub interface concept where the parent interface (in my case ens192) is the actual interface, and then the VLANs form subinterfaces below it.  So e.g. ens192.100 is a VLAN100 subinterface.  This isn’t to be confused with the “parent” in the command above, which can be either a VLAN subinterface or a regular interface.  So in my case ens192 is the parent of the VLAN subinterface ens192.100, and that VLAN subinterface will be the parent for the container subinterfaces.  Subinterface!

Let’s run ip addr show again:

[root@dockernet2 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 inet 127.0.0.1/8 scope host lo
 valid_lft forever preferred_lft forever
 inet6 ::1/128 scope host
 valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
 link/ether 00:50:56:82:e0:b4 brd ff:ff:ff:ff:ff:ff
 inet 10.0.0.206/24 brd 10.0.0.255 scope global ens192
 valid_lft forever preferred_lft forever
 inet6 fe80::250:56ff:fe82:e0b4/64 scope link
 valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
 link/ether 02:42:c1:57:d2:6d brd ff:ff:ff:ff:ff:ff
 inet 172.17.0.1/16 scope global docker0
 valid_lft forever preferred_lft forever
4: ens192.100@ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
 link/ether 00:50:56:82:e0:b4 brd ff:ff:ff:ff:ff:ff
 inet6 fe80::250:56ff:fe82:e0b4/64 scope link
 valid_lft forever preferred_lft forever

Here is the MACVLAN magic.  I didn’t need to do anything and it automatically created this subinterface.  There is no IP associated with it and in fact in my environment this VLAN (and the network, gateway, etc.) doesn’t really exist, I just made it up.  The docker0 bridge is still there but is not involved in this configuration.

Inspecting the network we see some familiar info:

[root@dockernet2 ~]# docker network inspect macvlan100
[
 {
 "Name": "macvlan100",
 "Id": "6a3a69fb53147348c897251321a757306e84ad2fcfd16d1cea70c7bfdac7e767",
 "Created": "2017-08-01T21:18:23.129516689-04:00",
 "Scope": "local",
 "Driver": "macvlan",
 "EnableIPv6": false,
 "IPAM": {
 "Driver": "default",
 "Options": {},
 "Config": [
 {
 "Subnet": "172.10.100.0/24",
 "Gateway": "172.10.100.1"
 }
 ]
 },
 "Internal": false,
 "Attachable": false,
 "Ingress": false,
 "ConfigFrom": {
 "Network": ""
 },
 "ConfigOnly": false,
 "Containers": {},
 "Options": {
 "parent": "ens192.100"
 },
 "Labels": {}
 }
]

IPAM is actually working here with the networks I defined, so we can expect Docker to hand out IPs on this network as well as establish the gateway of 172.10.100.1 as the default route.

Once again, we’ll spin up a container on this network and see how things go.

[root@dockernet2 ~]# docker run -dit --network=macvlan100 centos
d6dc88943989a30ac80a1bbef15ad539e62e5b463521d1502707b670919c894c
[root@dockernet2 ~]# docker container ls
CONTAINER ID IMAGE  COMMAND     CREATED       STATUS       PORTS  NAMES
d6dc88943989 centos "/bin/bash" 4 seconds ago Up 3 seconds        keen_franklin

From the docker inspect output, we see the network info:

"NetworkSettings": {
 "Bridge": "",
 "SandboxID": "239a4b5bbca62b897224366fd524b6c1997ea51ccbc40ada0eb42d340be54bbd",
 "HairpinMode": false,
 "LinkLocalIPv6Address": "",
 "LinkLocalIPv6PrefixLen": 0,
 "Ports": {},
 "SandboxKey": "/var/run/docker/netns/239a4b5bbca6",
 "SecondaryIPAddresses": null,
 "SecondaryIPv6Addresses": null,
 "EndpointID": "",
 "Gateway": "",
 "GlobalIPv6Address": "",
 "GlobalIPv6PrefixLen": 0,
 "IPAddress": "",
 "IPPrefixLen": 0,
 "IPv6Gateway": "",
 "MacAddress": "",
 "Networks": {
 "macvlan100": {
 "IPAMConfig": null,
 "Links": null,
 "Aliases": [
 "d6dc88943989"
 ],
 "NetworkID": "6a3a69fb53147348c897251321a757306e84ad2fcfd16d1cea70c7bfdac7e767",
 "EndpointID": "83c393876f2bed09133e3416312c6218a42f3142d39224f7042580902cc5d51b",
 "Gateway": "172.10.100.1",
 "IPAddress": "172.10.100.2",
 "IPPrefixLen": 24,
 "IPv6Gateway": "",
 "GlobalIPv6Address": "",
 "GlobalIPv6PrefixLen": 0,
 "MacAddress": "02:42:ac:0a:64:02",
 "DriverOpts": null

This is all familiar now right?  I can tell it has created a network namespace for this, and it has assigned a .2 address and set the gateway as .1.

No new interfaces have been created on the host.  And if we look at the bridge utility, we can see that the only bridge is still the original docker0 bridge with nothing attached:

[root@dockernet2 ~]# brctl show
bridge name bridge id         STP enabled   interfaces
docker0     8000.0242c157d26d no

I can exec into the namespace and see what the host networking looks like.

[root@dockernet2 ~]# ip netns exec 239a4b5bbca6 ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 inet 127.0.0.1/8 scope host lo
 valid_lft forever preferred_lft forever
5: eth0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
 link/ether 02:42:ac:0a:64:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
 inet 172.10.100.2/24 scope global eth0
 valid_lft forever preferred_lft forever
[root@dockernet2 ~]# ip netns exec 239a4b5bbca6 ip route show
default via 172.10.100.1 dev eth0
172.10.100.0/24 dev eth0 proto kernel scope link src 172.10.100.2

While this isn’t an actual veth interface like we saw with bridge, this should be pretty familiar.  Instead of a veth pairing, MACVLAN causes the parent interface (in our case if4 which is the VLAN100 subinterface) to become a bridge.  We won’t see this in our bridge utility, but we can see it in the ip link output in the container.

[root@dockernet2 ~]# brctl show
bridge name bridge id         STP enabled   interfaces
docker0     8000.0242c157d26d no
[root@dockernet2 ~]# ip netns exec d0b1c5235be6 ip -d link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 addrgenmode eui64
8: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT
 link/ether 02:42:ac:0a:65:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0
 macvlan mode bridge addrgenmode eui64

Similar to bridge mode, this lets us actually get out of the network namespace by essentially splitting the parent interface into subinterfaces with different MAC addresses.

In our case the subinterface has an IP address of .2 and the gateway is set correctly.  Of course in my environment that gateway IP doesn’t really exist so I can’t hit it (nor can it route me anywhere).

If I spin up another container, it creates a new network namespace for it.  And if I exec into that namespace:

[root@dockernet2 ~]# ip netns list
7d49c7d1813d
239a4b5bbca6
default
[root@dockernet2 ~]# ip netns exec 7d49c7d1813d ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 inet 127.0.0.1/8 scope host lo
 valid_lft forever preferred_lft forever
6: eth0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
 link/ether 02:42:ac:0a:64:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
 inet 172.10.100.3/24 scope global eth0
 valid_lft forever preferred_lft forever

Same deal, right?  Subinterface with if4 on the host as the parent.  Again this tells us that we are not leveraging veths because as we know from the bridge mode discussion, veths are always in unique pairs.  I can’t create a veth 0 & 1 and then create another veth2 that pairs with 1.

At this point I can ping from one container to the other as we’d expect.

[root@dockernet2 ~]# docker attach objective_nightingale
[root@88a9a9a3f943 /]# ping 172.10.100.2
PING 172.10.100.2 (172.10.100.2) 56(84) bytes of data.
64 bytes from 172.10.100.2: icmp_seq=1 ttl=64 time=0.090 ms
64 bytes from 172.10.100.2: icmp_seq=2 ttl=64 time=0.039 ms
64 bytes from 172.10.100.2: icmp_seq=3 ttl=64 time=0.040 ms
64 bytes from 172.10.100.2: icmp_seq=4 ttl=64 time=0.041 ms

I created a new MACVLAN network of VLAN101 then put a container on it.  Notice I can’t ping the other containers like I can with the bridge configuration.

[root@dockernet2 ~]# ip netns exec d0b1c5235be6 ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 inet 127.0.0.1/8 scope host lo
 valid_lft forever preferred_lft forever
8: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
 link/ether 02:42:ac:0a:65:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
 inet 172.10.101.2/24 scope global eth0
 valid_lft forever preferred_lft forever
[root@dockernet2 ~]# docker attach cranky_banach

#ping self
[root@17327380a67c /]# ping 172.10.101.2
PING 172.10.101.2 (172.10.101.2) 56(84) bytes of data.
64 bytes from 172.10.101.2: icmp_seq=1 ttl=64 time=0.026 ms
64 bytes from 172.10.101.2: icmp_seq=2 ttl=64 time=0.036 ms
64 bytes from 172.10.101.2: icmp_seq=3 ttl=64 time=0.032 ms
^C
--- 172.10.101.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.026/0.031/0.036/0.006 ms

#ping other container
[root@17327380a67c /]# ping 172.10.100.2
PING 172.10.100.2 (172.10.100.2) 56(84) bytes of data.
From 172.10.101.2 icmp_seq=1 Destination Host Unreachable
From 172.10.101.2 icmp_seq=2 Destination Host Unreachable
From 172.10.101.2 icmp_seq=3 Destination Host Unreachable
From 172.10.101.2 icmp_seq=4 Destination Host Unreachable
^C
--- 172.10.100.2 ping statistics ---
5 packets transmitted, 0 received, +4 errors, 100% packet loss, time 4000ms

Which we expect because my gateways don’t exist.  In order to ping across MACVLAN networks my upstream configuration has to be proper (trunking, VLANs configured, gateways exist, etc.).  In other words, with MACVLAN you have to do more outside of the Docker host to be functional, but you also have more control and segmentation than you do with raw bridge networks.

Another interesting aspect of MACVLAN and a distinction from bridge networking is that the subinterfaces are effectively (with respect to state) the same as the parent interface because they leverage it as a bridge.  So this means if I shut down the parent ens192.100 interface, watch what happens in the container:

#set parent down
root@dockernet2 ~]# ip link set dev ens192.100 down

#check network status
[root@dockernet2 ~]# ip link show dev ens192.100
4: ens192.100@ens192: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN mode DEFAULT
 link/ether 00:50:56:82:e0:b4 brd ff:ff:ff:ff:ff:ff

#check container interface
[root@dockernet2 ~]# ip netns exec 7d49c7d1813d ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 inet 127.0.0.1/8 scope host lo
 valid_lft forever preferred_lft forever
6: eth0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
 link/ether 02:42:ac:0a:64:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
 inet 172.10.100.3/24 scope global eth0
 valid_lft forever preferred_lft forever

#attach to container
[root@dockernet2 ~]# docker attach objective_nightingale

#ping self = successful
[root@88a9a9a3f943 /]# ping 172.10.100.3
PING 172.10.100.3 (172.10.100.3) 56(84) bytes of data.
64 bytes from 172.10.100.3: icmp_seq=1 ttl=64 time=0.036 ms
64 bytes from 172.10.100.3: icmp_seq=2 ttl=64 time=0.036 ms
^C
--- 172.10.100.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.036/0.036/0.036/0.000 ms

#ping other container = fail
[root@88a9a9a3f943 /]# ping 172.10.100.2
PING 172.10.100.2 (172.10.100.2) 56(84) bytes of data.
^C
--- 172.10.100.2 ping statistics ---
8 packets transmitted, 0 received, 100% packet loss, time 6999ms

So with the parent interface shutdown (or disconnected) all container traffic, even within the same host, will stop.  Compare this to bridge where an external outage wouldn’t affect internal traffic.

If you’ve been picking this stuff up like gangbusters you may be wondering, “I see you lost connectivity when shutting down the VLAN subinterface (ens192.100), but what happens when you lose connectivity to the parent interface (ens192) instead like an actual network disconnect?”  When this happens, the VLAN subinterface will also go down as well so the effect is the same.  Luckily you can configure bond interfaces with multiple links as parents to ensure high availability.

Here is a diagram of this configuration with two containers on the same MACVLAN network.

macvlan

When comparing MACVLAN to bridge, we find:

  • Better performance due to less overhead
  • No default internal connectivity to server (i.e. I can’t ping a server IP from a container)
  • Requires external configuration
  • Each container will have a unique MAC exposed to the outside world, which requires your switch to be able to handle this configuration
  • Requires that containers have “public” IP addresses on your network (not public as in internet, but public as in real ones on your network vs fake ones that Docker IPAM generates) – no NAT/port forwarding like with bridge mode
  • No port allocation on server, instead ports are allocated on the container IPs
  • Tight coupling between server networking state and container networking availability
  • Very likely that an external IPAM system or DHCP will be required to keep sane at scale

Once again I hope this was helpful in understanding a bit more about these Docker networking options!  Docker and CentOS are readily available so I encourage you to get it yourself and start tinkering.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s