In this final post of the series we take a look at Network namespaces. As we hinted at during the intro post, a network namespace isolates network related resources - a process running in a distinct network namespace has its own networking devices, routing tables, firewall rules etc. We can see this in action immediately by inspecting our current network environment.
The ip Command
Since we will be interacting with network devices in this post, we will re-enforce the superuser requirements that we relaxed in the previous posts. From now on, we will assume that both
isolateare being run with
Star of the show here is the
ip command - the Swiss Army Knife for networking in Linux - and we will use it extensively in this post.
Right now we have just run the
link list subcommand to show us what networking devices are currently available in the system (here we have
lo, the loopback interface and
ens33 an ethernet LAN interface).
As with all other namespaces, the system starts with an initial network namespace within to which all processes belong unless specified otherwise. Running this
ip link list command as-is gives us the networking devices owned by the initial namespace (since our shell and the
ip command belong to this namespace).
Named Network Namespaces
Let’s create a new network namespace:
Again, we’ve used the
ip command. Its
netns subcommand allows us to play with network namespaces - for example we can create new network namespaces using the
add subcommand of
netns and use
list to, well, list them.
You might notice that
list only returned our newly created namespace - shouldn’t it return at least two, the other one being the initial namespace that we mentioned earlier?
The reason for this is that
ip creates what is called a named network namespace, which simply is a network namespace that is identifiable by a unique name (in our case
Only named network namespaces are shown via
list and the initial network namespace isn’t named.
Named network namespaces are easier to get a hold of. For example, a file is created for each named network namespace under the
/var/run/netns folder and can be used by a process that wants to switch to its namespace. Another property of named network namespaces is that they can exist without having any process as a member - unlike non-named ones that will be deleted once all member processes exit.
Now that we have a child network namespace, we can see networking from its perspective.
We will be using the command prompt
C$to emphasize a shell running inside a child network namespace.
exec $namespace $command subcommand executes
$command in the named network namespace
Here we ran a shell inside the
coke namespace and listed the network devices available. We can see that at least our
ens33 device has disappeared.
The only device that shows up is loopback and even that interface is down.
We should be used to this by now, the default setup for namespaces are usually very strict. For network namespaces as we can see, no devices except
loopback will be present.
We can bring the
loopback interface up without any paperwork though:
We’re already starting to see that by running a process in a nested network namespace like
coke, we can be sure that it is isolated from the rest of the system as far as networking is concerned.
Our shell process running in
coke can only communicate via
loopback - this means that it can only communicate with processes that are also members of the
coke namespace but currently there are no other member processes (and in the name of isolation, we would like that it remains that way) so it’s a bit lonely.
Let’s try to relax this isolation a bit, we will create a tunnel through which processes in
coke can communicate with processes in our initial namespace.
Now, any network communication has to go via some network device and a device can exist in exactly one network namespace at any given time so communication between any two processes in different namespaces must go via at least two network devices - one in each network namespace.
We will use a virtual ethernet network device (or
veth for short) to fulfill our need.
Veth devices are always created as a pair of devices in a tunnel-like fashion so that messages written to the device on one end comes out of the device on the other end.
You might guess that we could easily have one end in the initial network namespace and the other in our child network namespace and have all inter-network-namespace communication go via the respective veth end device (and you would be correct).
veth1 device now shows up in the
coke namespace. But to make the veth pair functional, we need to give them both IP addresses and bring the interfaces up. We will do this in their respective network namespace.
We should see that
veth1 is up and has our assigned address
10.1.1.2 - the same should happen for
veth0 in the initial namespace.
Now we should be able to do an inter-namespace ping between two processes running in both namespaces.
The source code for this post can be found here.
As usual, we will now try to replicate what we’ve seen so far in code. Specifically, we will need to do the following:
- Execute the command within a new network namespace.
- Create a veth pair (veth0 <=> veth1).
- Move the veth1 device to the new namespace.
- Assign IP addresses to both devices and bring them up.
1 is straight-forward, we create our command process in a new network namespace by adding the
CLONE_NEWNET flag to
For the remaining steps, we will primarily be using the Netlink interface to communicate with Linux.
Netlink is primarily used for communication between regular applications (like
isolate) and the Linux kernel.
It exposes an API on top of sockets, based on a protocol that determines message structure and content.
Using this protocol we can send messages that Linux receives and translates to requests - like create a veth pair with names veth0 and veth1.
Let’s start by creating our netlink socket. In it, we specify that we want to use the
NETLINK_ROUTE protocol - this protocol covers implementations for network routing and device management.
Netlink Message Format
A Netlink message is a 4-byte aligned block of data containing a header (
struct nlmsghdr) and a payload. The header format is described here. The Network Interface Service (NIS) Module specifies the format (
struct ifinfomsg) that payload related to network interface administration must begin with.
Our request will be represented by the following
The NIS module requires the payload to be encoded as Netlink attributes.
Attributes provide a way to segment the payload into subsections.
An attribute has a type and a length in addition to a payload containing its actual data.
The Netlink message payload will be encoded as a list of attributes (where any such attribute can in turn have nested attributes) and we will have some helper functions to populate it with attributes. In code, an attribute is represented by the
rtattr struct in the
linux/rtnetlink.h header file as:
rta_len is the length of the attribute’s payload which immediately follows the
rt_attr struct in memory (i.e the next
How the content of this payload is interpreted is dictated by
rta_type and possible values are entirely dependent on the receiver implementation and the request being sent.
In an attempt to put this all together, let’s see how
isolate makes a netlink request to create veth pair with the following function
create_veth that fulfills step
As we can see, we need to be precise about what we send here - we had to encode the message in the exact way it will be interpreted by the kernel implementation and here it took us 3 nested attributes to do so.
I’m sure this is documented somewhere even though I was unable to find it after some googling - I mostly figured this out via strace and the
ip command source code.
Next, for step
3, is a method that, given an interface name
ifname and a network namespace file descriptor
netns, moves the device associated with that interface to the specified network namespace.
After creating the veth pair and moving one end to our target network namespace, step
4 has us assigning both end devices IP addresses and bringing their interfaces up.
For that we have a helper function
if_up which, given an interface name
ifname and ip address
ip to the device
ifname and brings it up.
For brevity we do not show those here but they can be found here instead.
Finally, we bring these methods together to prepare our network namespace for our command process.
Then we can call
prepare_netns right after we’re done setting up the user namespace.
Let’s try it out!