Overview
In this final post of the series we take a look at Network namespaces. As we hinted at during the intro post, a network namespace isolates network related resources - a process running in a distinct network namespace has its own networking devices, routing tables, firewall rules etc. We can see this in action immediately by inspecting our current network environment.
The ip Command
Since we will be interacting with network devices in this post, we will re-enforce the superuser requirements that we relaxed in the previous posts. From now on, we will assume that both
ip
andisolate
are being run withsudo
.
Star of the show here is the ip
command - the Swiss Army Knife for networking in Linux - and we will use it extensively in this post.
Right now we have just run the link list
subcommand to show us what networking devices are currently available in the system (here we have lo
, the loopback interface and ens33
an ethernet LAN interface).
As with all other namespaces, the system starts with an initial network namespace within to which all processes belong unless specified otherwise. Running this ip link list
command as-is gives us the networking devices owned by the initial namespace (since our shell and the ip
command belong to this namespace).
Named Network Namespaces
Let’s create a new network namespace:
Again, we’ve used the ip
command. Its netns
subcommand allows us to play with network namespaces - for example we can create new network namespaces using the add
subcommand of netns
and use list
to, well, list them.
You might notice that list
only returned our newly created namespace - shouldn’t it return at least two, the other one being the initial namespace that we mentioned earlier?
The reason for this is that ip
creates what is called a named network namespace, which simply is a network namespace that is identifiable by a unique name (in our case coke
).
Only named network namespaces are shown via list
and the initial network namespace isn’t named.
Named network namespaces are easier to get a hold of. For example, a file is created for each named network namespace under the /var/run/netns
folder and can be used by a process that wants to switch to its namespace. Another property of named network namespaces is that they can exist without having any process as a member - unlike non-named ones that will be deleted once all member processes exit.
Now that we have a child network namespace, we can see networking from its perspective.
We will be using the command prompt
C$
to emphasize a shell running inside a child network namespace.
The exec $namespace $command
subcommand executes $command
in the named network namespace $namespace
.
Here we ran a shell inside the coke
namespace and listed the network devices available. We can see that at least our ens33
device has disappeared.
The only device that shows up is loopback and even that interface is down.
We should be used to this by now, the default setup for namespaces are usually very strict. For network namespaces as we can see, no devices except loopback
will be present.
We can bring the loopback
interface up without any paperwork though:
Network Isolation
We’re already starting to see that by running a process in a nested network namespace like coke
, we can be sure that it is isolated from the rest of the system as far as networking is concerned.
Our shell process running in coke
can only communicate via loopback
- this means that it can only communicate with processes that are also members of the coke
namespace but currently there are no other member processes (and in the name of isolation, we would like that it remains that way) so it’s a bit lonely.
Let’s try to relax this isolation a bit, we will create a tunnel through which processes in coke
can communicate with processes in our initial namespace.
Now, any network communication has to go via some network device and a device can exist in exactly one network namespace at any given time so communication between any two processes in different namespaces must go via at least two network devices - one in each network namespace.
Veth Devices
We will use a virtual ethernet network device (or veth
for short) to fulfill our need.
Veth devices are always created as a pair of devices in a tunnel-like fashion so that messages written to the device on one end comes out of the device on the other end.
You might guess that we could easily have one end in the initial network namespace and the other in our child network namespace and have all inter-network-namespace communication go via the respective veth end device (and you would be correct).
Our veth1
device now shows up in the coke
namespace. But to make the veth pair functional, we need to give them both IP addresses and bring the interfaces up. We will do this in their respective network namespace.
We should see that veth1
is up and has our assigned address 10.1.1.2
- the same should happen for veth0
in the initial namespace.
Now we should be able to do an inter-namespace ping between two processes running in both namespaces.
Implementation
The source code for this post can be found here.
As usual, we will now try to replicate what we’ve seen so far in code. Specifically, we will need to do the following:
- Execute the command within a new network namespace.
- Create a veth pair (veth0 <=> veth1).
- Move the veth1 device to the new namespace.
- Assign IP addresses to both devices and bring them up.
Step 1
is straight-forward, we create our command process in a new network namespace by adding the CLONE_NEWNET
flag to clone
:
Netlink
For the remaining steps, we will primarily be using the Netlink interface to communicate with Linux.
Netlink is primarily used for communication between regular applications (like isolate
) and the Linux kernel.
It exposes an API on top of sockets, based on a protocol that determines message structure and content.
Using this protocol we can send messages that Linux receives and translates to requests - like create a veth pair with names veth0 and veth1.
Let’s start by creating our netlink socket. In it, we specify that we want to use the NETLINK_ROUTE
protocol - this protocol covers implementations for network routing and device management.
Netlink Message Format
A Netlink message is a 4-byte aligned block of data containing a header (struct nlmsghdr
) and a payload. The header format is described here. The Network Interface Service (NIS) Module specifies the format (struct ifinfomsg
) that payload related to network interface administration must begin with.
Our request will be represented by the following C
struct:
Netlink Attributes
The NIS module requires the payload to be encoded as Netlink attributes.
Attributes provide a way to segment the payload into subsections.
An attribute has a type and a length in addition to a payload containing its actual data.
The Netlink message payload will be encoded as a list of attributes (where any such attribute can in turn have nested attributes) and we will have some helper functions to populate it with attributes.
In code, an attribute is represented by the rtattr
struct in the linux/rtnetlink.h
header file as:
rta_len
is the length of the attribute’s payload which immediately follows the rt_attr
struct in memory (i.e the next rta_len
bytes).
How the content of this payload is interpreted is dictated by rta_type
and possible values are entirely dependent on the receiver implementation and the request being sent.
In an attempt to put this all together, let’s see how isolate
makes a netlink request to create veth pair with the following function create_veth
that fulfills step 2
:
As we can see, we need to be precise about what we send here - we had to encode the message in the exact way it will be interpreted by the kernel implementation and here it took us 3 nested attributes to do so.
I’m sure this is documented somewhere even though I was unable to find it after some googling - I mostly figured this out via strace and the ip
command source code.
Next, for step 3
, is a method that, given an interface name ifname
and a network namespace file descriptor netns
, moves the device associated with that interface to the specified network namespace.
After creating the veth pair and moving one end to our target network namespace, step 4
has us assigning both end devices IP addresses and bringing their interfaces up.
For that we have a helper function if_up
which, given an interface name ifname
and ip address ip
, assigns ip
to the device ifname
and brings it up.
For brevity we do not show those here but they can be found here instead.
Finally, we bring these methods together to prepare our network namespace for our command process.
Then we can call prepare_netns
right after we’re done setting up the user namespace.
Let’s try it out!