In this series of posts we will look closely at one of the main ingredients in a container - Namespaces.
In the process, we will create a simpler clone of the
docker run command - our very own program that will take as input a command (along with it’s arguments if any) and spin up a container process to run it, isolated from the rest of the system similar to how you would
docker run it from an image.
What is a namespace?
A Linux namespace is an abstraction over resources in the operating system.
We can think of a namespace as a box. Inside this box are these system resources, which ones exactly depend on the box’s (namespace’s) type.
There are currently 7 types of namespaces
For instance, the
Network namespace encapsulates system resources related to networking such
as network interfaces (e.g
eth0), route tables etc, the
Mount namespace encapsulates files and directories in the system,
PID contains process IDs and so on.
So two instances of a
B (corresponding to two boxes of the same type in our analogy) can contain different resources - maybe
eth0 and a different route table copy.
Namespaces aren’t some addon feature or library that you need to apt install, they are provided by the Linux kernel itself and already are a prerequisite to run any process on the system.
At any given moment, any process
P belongs to exactly one instance of each namespace type - so when it needs to say, update the route table on the system, Linux shows it the copy of the route table of the namespace to which it belongs at that moment.
What is it good for?
Absolutely nothi… just kidding. One good thing with boxes is that you can add and remove stuff from one box and it will not affect the content of other boxes.
That’s the same idea here with namespaces - a process
P can go crazy and
sudo rm -rf / but another process
Q that belongs to a different
Mount namespace will be unaffected since they’re using distinct copies of those files.
Note though that a resource encapsulated within a namespace doesn’t necessarily mean that it’s a unique copy.
In a number of cases, either by design or as a security hole, two or more namespaces will contain the same copy, e.g of the same file, so that changes made to that file in one
Mount namespace will in fact be visible in all other
Mount namespaces that also reference it.
For this reason, we will retire our box analogy here since an item cannot
simultaneously exist in two distinct boxes 😞.
Unsharing is caring
We can see the namespaces that a process belongs to!
In typical Linux fashion, they’re exposed as files under the directory
/proc/$pid/ns for a given process with process id
You can open a second terminal and run the same command and it should give you the exact same output - this is because, as we mentioned earlier, a process must belong to some namespace and unless we explicitly
specify which ones, Linux adds it as a member to the default namespaces.
Let’s meddle in this a bit. In the second terminal we can run something like:
unshare command runs a program (optionally) in a new namespace.
-u flag tells it to run
bash in a new
Notice how our new
bash process points to a different
uts file while all others remain the same.
Creating new namespaces usually requires superuser access. From now on, we will assume that both
unshareor our implementation are run with
One implication of what we just did is that we can now change the
hostname from within our new
bash process and it won’t affect any other process in the system.
You can verify this by running
hostname in the first shell or a new one
and seeing that the hostname hasn’t changed there.
But like, what is a container though?
Hopefully, now you have some idea of what a namespace can do. You might guess that containers are fundamentally ordinary processes with different namespaces from other processes and you’d be correct. In fact a quote, unquote container doesn’t have to belong to a unique namespace for each type - it can share some of them.
For instance, when you
docker run --net=host redis, all you do is tell docker to not create a new
Network namespace for the
redis process, and as we saw, Linux will add that process as a member of the default
Network namespace just like every other regular process.
So the redis process is exactly like everyone else from a networking perspective.
Networking isn’t special here,
docker run let’s you do this customization
for most namespaces.
This begs the question of what even is a container?
Is a process that shares all but one namespace still a container? ¯\_(ツ)_/¯
Usually containers come with the notion of isolation, achieved through namespaces - the smaller the number of namespaces and resources that a process shares, the more isolated the process is and that’s all that really matters.
In the remainder of this post, we will lay the ground work for our
program that we will call
isolate takes a command as arguments and runs that command in a new process isolated from the rest of the system and within its very own namespaces.
In the coming posts, we will look at adding support for individual
isolate spins up the command process.
In terms of scope, we will focus on the
Network namespaces. The rest are relatively trivial to
implement once we’re done (in fact, we add
UTS support in the initial
implementation here) and
Cgroup for example is only interesting from a
perspective that is out of scope of this series (studying
the other ingredient in containers that is used to control how much of a resource a process is allowed to use).
Namespaces can get complex real quick so there are lots of different
paths we can take while studying each namespace but we can’t take them all.
We will only discuss the paths that are relevant to the program that
Each post will start off with some experiments on the namespace in
question within a terminal in an attempt to understand the interactions
involved in setting up that namespace.
After this we will already have an idea of what we want to accomplish and will then follow up with a corresponding implementation in
To avoid bombarding the posts with code, we will not include things like helper functions that are not necessary to understand the implementation. You can find the full source code here on Github.
The source code for this post can be found here.
isolate implementation will initially be a simple program that reads
a command path from stdin and clones a new process that executes the command with the specified arguments.
The cloned command process will run in its own
just like we did with
In later posts, we will see that namespaces do not necessarily work (or even provide isolation) out of the box and we will need to do some setup after creating them (but before executing the actual command) in order for the command to truly run in isolation.
This namespace creation-setup combo will require some co-operation between
isolate process and the child command process.
As a result, part of the ground work here will be to setup a
communication channel between both processes - we will use a Linux pipe
due to its simplicity given our use case.
We have three things to do:
- Create the main
isolateprocess that reads from stdin.
- Clone a new process that will run the command in a new
- Set up a pipe so that the command process begins the command execution only after it receives a signal from the main process that the namespace setup is done.
Here is the main process:
clone_flags that we pass to our
See how dead simple it is to create a new process in its own namespace?
All we have to do is set the flag for the namespace type
CLONE_NEWUTS flag corresponds to the
UTS namespace) and Linux
takes care of the rest.
Next, the command process waits for a signal before exec-ing the command:
Finally we can try it out:
isolate is a little bit more than a program that
just forks off a command (we do have the
UTS thing going for us).
In the next post, we take it a step further by looking at
namespaces and have
isolate run the command in its own
There, we will see that we actually need to do some work in order to
have a usable namespace in which the command can run.