10 KiB
Linux 命名空间
背景
从2.6.24版的内核开始,Linux 就支持6种不同类型的命名空间。它们的出现,使用户创建的进程能够与系统分离得更加彻底,从而不需要考虑太多底层的虚拟化技术。
- CLONE_NEWIPC: 进程间通信(IPC)的命名空间,可以将 SystemV 的 IPC 和 POSIX 的消息队列独立出来。
- CLONE_NEWPID: 进程 ID 的命名空间,进程 ID 独立,意思就是命名空间内的进程 ID 可能会与命名空间外的进程 ID 冲突,于是命名空间内的进程 ID 映射到命名空间外时会使用另外一个进程 ID。比如说,命名空间内 ID 为1的进程,在命名空间外就是指 init 进程。
- CLONE_NEWNET: 网络命名空间,用于隔离网络资源(/proc/net、IP 地址、网卡、路由等)。后台进程可以运行在不同命名空间内的相同端口上,用户还可以虚拟出一块网卡。
- CLONE_NEWNS: 挂载命名空间,进程运行时可以将挂载点与系统分离,使用这个功能时,我们可以达到 chroot 的功能,而在安全性方面比 chroot 更高。
- CLONE_NEWUTS: UTS 命名空间,主要目的是独立出主机名和网络信息服务(NIS)。
- CLONE_NEWUSER: 用户命名空间,同进程 ID 一样,用户 ID 和组 ID 在命名空间内外是不一样的,并且在不同命名空间内可以存在相同的 ID。
本文用 C 语言介绍上述概念,因为演示进程命名空间的时候需要用到 C 语言。下面的测试过程在 Debian 6 和 Debian 7 上执行。首先,在栈内分配一页内存空间,并将指针指向内存页的末尾。这里我们使用 alloca() 函数来分配内存,不要用 malloc() 函数,它会把内存分配在堆上。
void *mem = alloca(sysconf(_SC_PAGESIZE)) + sysconf(_SC_PAGESIZE);
然后使用 clone() 函数创建子进程,传入栈空间的地址 "mem",以及指定命名空间的标记。同时我们还指定“callee”作为子进程运行的函数。
mypid = clone(callee, mem, SIGCHLD | CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWNS | CLONE_FILES, NULL);
clone 之后我们要在父进程中等待子进程先退出,否则的话,父进程会继续运行下去,直到进程结束,留下子进程变成孤儿进程:
while (waitpid(mypid, &r, 0) < 0 && errno == EINTR)
{
continue;
}
最后当子进程退出后,我们会回到 shell 界面。
if (WIFEXITED(r))
{
return WEXITSTATUS(r);
}
return EXIT_FAILURE;
上文介绍的 callee 函数功能如下:
static int callee()
{
int ret;
mount("proc", "/proc", "proc", 0, "");
setgid(u);
setgroups(0, NULL);
setuid(u);
ret = execl("/bin/bash", "/bin/bash", NULL);
return ret;
}
程序挂载 /proc 文件系统,设置用户 ID 和组 ID,值都为“u”,然后运行 /bin/bash 程序,LXC 是操作系统级的虚拟化工具,使用 cgroups 和命名空间来完成资源的分离。现在我们把所有代码放在一起,变量“u”的值设为65534,在 Debian 系统中,这是“nobody”和“nogroup”:
#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/mount.h>
#include <grp.h>
#include <alloca.h>
#include <errno.h>
#include <sched.h>
static int callee();
const int u = 65534;
int main(int argc, char *argv[])
{
int r;
pid_t mypid;
void *mem = alloca(sysconf(_SC_PAGESIZE)) + sysconf(_SC_PAGESIZE);
mypid = clone(callee, mem, SIGCHLD | CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWNS | CLONE_FILES, NULL);
while (waitpid(mypid, &r, 0) < 0 && errno == EINTR)
{
continue;
}
if (WIFEXITED(r))
{
return WEXITSTATUS(r);
}
return EXIT_FAILURE;
}
static int callee()
{
int ret;
mount("proc", "/proc", "proc", 0, "");
setgid(u);
setgroups(0, NULL);
setuid(u);
ret = execl("/bin/bash", "/bin/bash", NULL);
return ret;
}
执行以下命令来运行上面的代码:
root@w:~/pen/tmp# gcc -O -o ns.c -Wall -Werror -ansi -c89 ns.c
root@w:~/pen/tmp# ./ns
nobody@w:~/pen/tmp$ id
uid=65534(nobody) gid=65534(nogroup)
nobody@w:~/pen/tmp$ ps auxw
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
nobody 1 0.0 0.0 4620 1816 pts/1 S 21:21 0:00 /bin/bash
nobody 5 0.0 0.0 2784 1064 pts/1 R+ 21:21 0:00 ps auxw
nobody@w:~/pen/tmp$
Notice that the UID and GID are set to that of nobody and nogroup. Specifically notice that the full ps output shows only two running processes and that their PIDs are 1 and 5 respectively. Now, let's move on to using ip netns to work with network namespaces. First, let's confirm that no namespaces exist currently:
root@w:~# ip netns list
Object "netns" is unknown, try "ip help".
In this case, either ip needs an upgrade, or the kernel does. Assuming you have a kernel newer than 2.6.24, it's most likely ip. After upgrading, ip netns list should by default return nothing. Let's add a new namespace called 'ns1':
root@w:~# ip netns add ns1
root@w:~# ip netns list
ns1
First, let's list the current interfaces:
root@w:~# ip link list
1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 1000
link/ether 00:0c:29:65:25:9e brd ff:ff:ff:ff:ff:ff
Now to create a new virtual interface, and add it to our new namespace. Virtual interfaces are created in pairs, and are linked to each other - imagine a virtual crossover cable:
root@w:~# ip link add veth0 type veth peer name veth1
root@w:~# ip link list
1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 1000
link/ether 00:0c:29:65:25:9e brd ff:ff:ff:ff:ff:ff
3: veth1: mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
link/ether d2:e9:52:18:19:ab brd ff:ff:ff:ff:ff:ff
4: veth0: mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
link/ether f2:f7:5e:e2:22:ac brd ff:ff:ff:ff:ff:ff
ifconfig -a will also now show the addition of both veth0 and veth1.
Great, now to assign our new interfaces to the namespace. Note that ip netns exec is used to execute commands within the namespace:
root@w:~# ip link set veth1 netns ns1
root@w:~# ip netns exec ns1 ip link list
1: lo: mtu 65536 qdisc noop state DOWN mode DEFAULT
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: veth1: mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
link/ether d2:e9:52:18:19:ab brd ff:ff:ff:ff:ff:ff
ifconfig -a will now only show veth0, as veth1 is in the ns1 namespace.
Should we want to delete veth0/veth1:
ip netns exec ns1 ip link del veth1
We can now assign IP address 192.168.5.5/24 to veth0 on our host:
ifconfig veth0 192.168.5.5/24
And assign veth1 192.168.5.10/24 within ns1:
ip netns exec ns1 ifconfig veth1 192.168.5.10/24 up
To execute ip addr list on both our host and within our namespace:
root@w:~# ip addr list
1: lo: mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
link/ether 00:0c:29:65:25:9e brd ff:ff:ff:ff:ff:ff
inet 192.168.3.122/24 brd 192.168.3.255 scope global eth0
inet6 fe80::20c:29ff:fe65:259e/64 scope link
valid_lft forever preferred_lft forever
6: veth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 86:b2:c7:bd:c9:11 brd ff:ff:ff:ff:ff:ff
inet 192.168.5.5/24 brd 192.168.5.255 scope global veth0
inet6 fe80::84b2:c7ff:febd:c911/64 scope link
valid_lft forever preferred_lft forever
root@w:~# ip netns exec ns1 ip addr list
1: lo: mtu 65536 qdisc noop state DOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
5: veth1: mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 12:bd:b6:76:a6:eb brd ff:ff:ff:ff:ff:ff
inet 192.168.5.10/24 brd 192.168.5.255 scope global veth1
inet6 fe80::10bd:b6ff:fe76:a6eb/64 scope link
valid_lft forever preferred_lft forever
To view routing tables inside and outside of the namespace:
root@w:~# ip route list
default via 192.168.3.1 dev eth0 proto static
192.168.3.0/24 dev eth0 proto kernel scope link src 192.168.3.122
192.168.5.0/24 dev veth0 proto kernel scope link src 192.168.5.5
root@w:~# ip netns exec ns1 ip route list
192.168.5.0/24 dev veth1 proto kernel scope link src 192.168.5.10
Lastly, to connect our physical and virtual interfaces, we'll require a bridge. Let's bridge eth0 and veth0 on the host, and then use DHCP to gain an IP within the ns1 namespace:
root@w:~# brctl addbr br0
root@w:~# brctl addif br0 eth0
root@w:~# brctl addif br0 veth0
root@w:~# ifconfig eth0 0.0.0.0
root@w:~# ifconfig veth0 0.0.0.0
root@w:~# dhclient br0
root@w:~# ip addr list br0
7: br0: mtu 1500 qdisc noqueue state UP
link/ether 00:0c:29:65:25:9e brd ff:ff:ff:ff:ff:ff
inet 192.168.3.122/24 brd 192.168.3.255 scope global br0
inet6 fe80::20c:29ff:fe65:259e/64 scope link
valid_lft forever preferred_lft forever
br0 has been assigned an IP of 192.168.3.122/24. Now for the namespace:
root@w:~# ip netns exec ns1 dhclient veth1
root@w:~# ip netns exec ns1 ip addr list
1: lo: mtu 65536 qdisc noop state DOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
5: veth1: mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 12:bd:b6:76:a6:eb brd ff:ff:ff:ff:ff:ff
inet 192.168.3.248/24 brd 192.168.3.255 scope global veth1
inet6 fe80::10bd:b6ff:fe76:a6eb/64 scope link
valid_lft forever preferred_lft forever
Excellent! veth1 has been assigned 192.168.3.248/24
via: http://www.howtoforge.com/linux-namespaces
浣滆€咃細aziods 璇戣€咃細璇戣€匢D 鏍″<E98F8D>锛歔鏍″<E98F8D>鑰匢D](https://github.com/鏍″<EFBFBD>鑰匢D)
鏈<EFBFBD>枃鐢<EFBFBD> LCTT 鍘熷垱缈昏瘧锛孾Linux涓<78>浗](http://linux.cn/) 鑽h獕鎺ㄥ嚭