You are probably wondering, “well this all sounds amazing but have you ever heard of VLANs?” Yes, yes I have. And yes, I know you very likely use them in your network today. Depending on how your Docker hosts exist and the span of your network, I would guess it is very unlikely that you want everything on a single flat network and further that you probably lack the interfaces to support an interface for every network.
Enter MACVLAN. Note that this network actually doesn’t exist as a default network and it must be created by you in order to use it.
I’m starting off from scratch again, no containers running, a basic setup.
[root@dockernet2 ~]# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:50:56:82:e0:b4 brd ff:ff:ff:ff:ff:ff inet 10.0.0.206/24 brd 10.0.0.255 scope global ens192 valid_lft forever preferred_lft forever inet6 fe80::250:56ff:fe82:e0b4/64 scope link valid_lft forever preferred_lft forever 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN link/ether 02:42:c1:57:d2:6d brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 scope global docker0 valid_lft forever preferred_lft forever
You can see the 3 normal interfaces here that we’ve gotten familiar with during our network exploration.
Let’s go ahead and create a MACVLAN network and check it out:
[root@dockernet2 ~]# docker network create --driver macvlan --subnet 172.10.100.0/24 --gateway 172.10.100.1 -o parent=ens192.100 macvlan100
6a3a69fb53147348c897251321a757306e84ad2fcfd16d1cea70c7bfdac7e767
[root@dockernet2 ~]# docker network ls
NETWORK ID NAME DRIVER SCOPE
b31f10c1052e bridge bridge local
f2f1cf53c9f2 host host local
6a3a69fb5314 macvlan100 macvlan local
b0cf5ba8b44d none null local
OK so what did I do? I created a new network using the macvlan driver, designated the subnet, the gateway, the parent interface where containers will connect, and finally named it macvlan100.
VLANs on Linux use the parent/sub interface concept where the parent interface (in my case ens192) is the actual interface, and then the VLANs form subinterfaces below it. So e.g. ens192.100 is a VLAN100 subinterface. This isn’t to be confused with the “parent” in the command above, which can be either a VLAN subinterface or a regular interface. So in my case ens192 is the parent of the VLAN subinterface ens192.100, and that VLAN subinterface will be the parent for the container subinterfaces. Subinterface!
Let’s run ip addr show again:
[root@dockernet2 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:50:56:82:e0:b4 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.206/24 brd 10.0.0.255 scope global ens192
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe82:e0b4/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
link/ether 02:42:c1:57:d2:6d brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
4: ens192.100@ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether 00:50:56:82:e0:b4 brd ff:ff:ff:ff:ff:ff
inet6 fe80::250:56ff:fe82:e0b4/64 scope link
valid_lft forever preferred_lft forever
Here is the MACVLAN magic. I didn’t need to do anything and it automatically created this subinterface. There is no IP associated with it and in fact in my environment this VLAN (and the network, gateway, etc.) doesn’t really exist, I just made it up. The docker0 bridge is still there but is not involved in this configuration.
Inspecting the network we see some familiar info:
[root@dockernet2 ~]# docker network inspect macvlan100 [ { "Name": "macvlan100", "Id": "6a3a69fb53147348c897251321a757306e84ad2fcfd16d1cea70c7bfdac7e767", "Created": "2017-08-01T21:18:23.129516689-04:00", "Scope": "local", "Driver": "macvlan", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": {}, "Config": [ { "Subnet": "172.10.100.0/24", "Gateway": "172.10.100.1" } ] }, "Internal": false, "Attachable": false, "Ingress": false, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": {}, "Options": { "parent": "ens192.100" }, "Labels": {} } ]
IPAM is actually working here with the networks I defined, so we can expect Docker to hand out IPs on this network as well as establish the gateway of 172.10.100.1 as the default route.
Once again, we’ll spin up a container on this network and see how things go.
[root@dockernet2 ~]# docker run -dit --network=macvlan100 centos d6dc88943989a30ac80a1bbef15ad539e62e5b463521d1502707b670919c894c [root@dockernet2 ~]# docker container ls CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d6dc88943989 centos "/bin/bash" 4 seconds ago Up 3 seconds keen_franklin
From the docker inspect output, we see the network info:
"NetworkSettings": { "Bridge": "", "SandboxID": "239a4b5bbca62b897224366fd524b6c1997ea51ccbc40ada0eb42d340be54bbd", "HairpinMode": false, "LinkLocalIPv6Address": "", "LinkLocalIPv6PrefixLen": 0, "Ports": {}, "SandboxKey": "/var/run/docker/netns/239a4b5bbca6", "SecondaryIPAddresses": null, "SecondaryIPv6Addresses": null, "EndpointID": "", "Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "IPAddress": "", "IPPrefixLen": 0, "IPv6Gateway": "", "MacAddress": "", "Networks": { "macvlan100": { "IPAMConfig": null, "Links": null, "Aliases": [ "d6dc88943989" ], "NetworkID": "6a3a69fb53147348c897251321a757306e84ad2fcfd16d1cea70c7bfdac7e767", "EndpointID": "83c393876f2bed09133e3416312c6218a42f3142d39224f7042580902cc5d51b", "Gateway": "172.10.100.1", "IPAddress": "172.10.100.2", "IPPrefixLen": 24, "IPv6Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "MacAddress": "02:42:ac:0a:64:02", "DriverOpts": null
This is all familiar now right? I can tell it has created a network namespace for this, and it has assigned a .2 address and set the gateway as .1.
No new interfaces have been created on the host. And if we look at the bridge utility, we can see that the only bridge is still the original docker0 bridge with nothing attached:
[root@dockernet2 ~]# brctl show bridge name bridge id STP enabled interfaces docker0 8000.0242c157d26d no
I can exec into the namespace and see what the host networking looks like.
[root@dockernet2 ~]# ip netns exec 239a4b5bbca6 ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 5: eth0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether 02:42:ac:0a:64:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.10.100.2/24 scope global eth0 valid_lft forever preferred_lft forever [root@dockernet2 ~]# ip netns exec 239a4b5bbca6 ip route show default via 172.10.100.1 dev eth0 172.10.100.0/24 dev eth0 proto kernel scope link src 172.10.100.2
While this isn’t an actual veth interface like we saw with bridge, this should be pretty familiar. Instead of a veth pairing, MACVLAN causes the parent interface (in our case if4 which is the VLAN100 subinterface) to become a bridge. We won’t see this in our bridge utility, but we can see it in the ip link output in the container.
[root@dockernet2 ~]# brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.0242c157d26d no
[root@dockernet2 ~]# ip netns exec d0b1c5235be6 ip -d link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 addrgenmode eui64
8: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT
link/ether 02:42:ac:0a:65:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0
macvlan mode bridge addrgenmode eui64
Similar to bridge mode, this lets us actually get out of the network namespace by essentially splitting the parent interface into subinterfaces with different MAC addresses.
In our case the subinterface has an IP address of .2 and the gateway is set correctly. Of course in my environment that gateway IP doesn’t really exist so I can’t hit it (nor can it route me anywhere).
If I spin up another container, it creates a new network namespace for it. And if I exec into that namespace:
[root@dockernet2 ~]# ip netns list
7d49c7d1813d
239a4b5bbca6
default
[root@dockernet2 ~]# ip netns exec 7d49c7d1813d ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
6: eth0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
link/ether 02:42:ac:0a:64:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.10.100.3/24 scope global eth0
valid_lft forever preferred_lft forever
Same deal, right? Subinterface with if4 on the host as the parent. Again this tells us that we are not leveraging veths because as we know from the bridge mode discussion, veths are always in unique pairs. I can’t create a veth 0 & 1 and then create another veth2 that pairs with 1.
At this point I can ping from one container to the other as we’d expect.
[root@dockernet2 ~]# docker attach objective_nightingale [root@88a9a9a3f943 /]# ping 172.10.100.2 PING 172.10.100.2 (172.10.100.2) 56(84) bytes of data. 64 bytes from 172.10.100.2: icmp_seq=1 ttl=64 time=0.090 ms 64 bytes from 172.10.100.2: icmp_seq=2 ttl=64 time=0.039 ms 64 bytes from 172.10.100.2: icmp_seq=3 ttl=64 time=0.040 ms 64 bytes from 172.10.100.2: icmp_seq=4 ttl=64 time=0.041 ms
I created a new MACVLAN network of VLAN101 then put a container on it. Notice I can’t ping the other containers like I can with the bridge configuration.
[root@dockernet2 ~]# ip netns exec d0b1c5235be6 ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 8: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether 02:42:ac:0a:65:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.10.101.2/24 scope global eth0 valid_lft forever preferred_lft forever [root@dockernet2 ~]# docker attach cranky_banach #ping self [root@17327380a67c /]# ping 172.10.101.2 PING 172.10.101.2 (172.10.101.2) 56(84) bytes of data. 64 bytes from 172.10.101.2: icmp_seq=1 ttl=64 time=0.026 ms 64 bytes from 172.10.101.2: icmp_seq=2 ttl=64 time=0.036 ms 64 bytes from 172.10.101.2: icmp_seq=3 ttl=64 time=0.032 ms ^C --- 172.10.101.2 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1999ms rtt min/avg/max/mdev = 0.026/0.031/0.036/0.006 ms #ping other container [root@17327380a67c /]# ping 172.10.100.2 PING 172.10.100.2 (172.10.100.2) 56(84) bytes of data. From 172.10.101.2 icmp_seq=1 Destination Host Unreachable From 172.10.101.2 icmp_seq=2 Destination Host Unreachable From 172.10.101.2 icmp_seq=3 Destination Host Unreachable From 172.10.101.2 icmp_seq=4 Destination Host Unreachable ^C --- 172.10.100.2 ping statistics --- 5 packets transmitted, 0 received, +4 errors, 100% packet loss, time 4000ms
Which we expect because my gateways don’t exist. In order to ping across MACVLAN networks my upstream configuration has to be proper (trunking, VLANs configured, gateways exist, etc.). In other words, with MACVLAN you have to do more outside of the Docker host to be functional, but you also have more control and segmentation than you do with raw bridge networks.
Another interesting aspect of MACVLAN and a distinction from bridge networking is that the subinterfaces are effectively (with respect to state) the same as the parent interface because they leverage it as a bridge. So this means if I shut down the parent ens192.100 interface, watch what happens in the container:
#set parent down root@dockernet2 ~]# ip link set dev ens192.100 down #check network status [root@dockernet2 ~]# ip link show dev ens192.100 4: ens192.100@ens192: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN mode DEFAULT link/ether 00:50:56:82:e0:b4 brd ff:ff:ff:ff:ff:ff #check container interface [root@dockernet2 ~]# ip netns exec 7d49c7d1813d ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 6: eth0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether 02:42:ac:0a:64:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.10.100.3/24 scope global eth0 valid_lft forever preferred_lft forever #attach to container [root@dockernet2 ~]# docker attach objective_nightingale #ping self = successful [root@88a9a9a3f943 /]# ping 172.10.100.3 PING 172.10.100.3 (172.10.100.3) 56(84) bytes of data. 64 bytes from 172.10.100.3: icmp_seq=1 ttl=64 time=0.036 ms 64 bytes from 172.10.100.3: icmp_seq=2 ttl=64 time=0.036 ms ^C --- 172.10.100.3 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.036/0.036/0.036/0.000 ms #ping other container = fail [root@88a9a9a3f943 /]# ping 172.10.100.2 PING 172.10.100.2 (172.10.100.2) 56(84) bytes of data. ^C --- 172.10.100.2 ping statistics --- 8 packets transmitted, 0 received, 100% packet loss, time 6999ms
So with the parent interface shutdown (or disconnected) all container traffic, even within the same host, will stop. Compare this to bridge where an external outage wouldn’t affect internal traffic.
If you’ve been picking this stuff up like gangbusters you may be wondering, “I see you lost connectivity when shutting down the VLAN subinterface (ens192.100), but what happens when you lose connectivity to the parent interface (ens192) instead like an actual network disconnect?” When this happens, the VLAN subinterface will also go down as well so the effect is the same. Luckily you can configure bond interfaces with multiple links as parents to ensure high availability.
Here is a diagram of this configuration with two containers on the same MACVLAN network.
When comparing MACVLAN to bridge, we find:
- Better performance due to less overhead
- No default internal connectivity to server (i.e. I can’t ping a server IP from a container)
- Requires external configuration
- Each container will have a unique MAC exposed to the outside world, which requires your switch to be able to handle this configuration
- Requires that containers have “public” IP addresses on your network (not public as in internet, but public as in real ones on your network vs fake ones that Docker IPAM generates) – no NAT/port forwarding like with bridge mode
- No port allocation on server, instead ports are allocated on the container IPs
- Tight coupling between server networking state and container networking availability
- Very likely that an external IPAM system or DHCP will be required to keep sane at scale
Once again I hope this was helpful in understanding a bit more about these Docker networking options! Docker and CentOS are readily available so I encourage you to get it yourself and start tinkering.