Pod 安全性
安全标准
-
Privileged
不受限制的策略,此类 Pod 权限较高,通常为一些系统级别或基础设施级别的工作负载 -
Baseline
限制性最弱的策略,禁止已知的特权提升-
HostProcess, windows related (v1.26 stable)
- Restricted Fields
spec.securityContext.windowsOptions.hostProcess
spec.containers[*].securityContext.windowsOptions.hostProcess
spec.initContainers[*].securityContext.windowsOptions.hostProcess
spec.ephemeralContainers[*].securityContext.windowsOptions.hostProcess
- Allowed Values
- undefined/nil
- false
- Restricted Fields
-
Host Namespaces: Sharing the host namespaces must be disallowed.
- Restricted Fields
spec.hostNetwork
spec.hostPID
spec.hostIPC
- Allowed Values
- undefined/nil
- false
- Restricted Fields
-
Privileged Containers: Privileged Pods disable most security mechanisms and must be disallowed.
- Restricted Fields
spec.containers[*].securityContext.privileged
spec.initContainers[*].securityContext.privileged
spec.ephemeralContainers[*].securityContext.privileged
- Allowed Values
- undefined/nil
- false
- Restricted Fields
-
Capabilities: Adding additional capabilities beyond those listed below must be disallowed.
- Restricted Fields
spec.containers[*].securityContext.capabilities.add
spec.initContainers[*].securityContext.capabilities.add
spec.ephemeralContainers[*].securityContext.capabilities.add
- Allowed Values
- undefined/nil
AUDIT_WRITE
允许写入审计日志CHOWN
允许容器更改文件所有者DAC_OVERRIDE
允许容器忽略文件的 DAC 权限(Discretionary Access Controls)即,读、写、执行、特殊权限FOWNER
允许容器更改文件所有者为任何用户FSETID
允许容器设置文件的 Setuid 位或 Setgid 位KILL
允许容器相其他进程发送信号MKNOD
允许容器创建特殊文件节点NET_BIND_SERVICE
允许容器绑定到低于 1024 的端口号SETFCAP
允许容器设置文件的能力,如某个文件需要一些特权操作,但又不想以 root 用户身份执行SETGID
允许容器设置有效的组 ID(宿主机)SETPCAP
允许容器进程修改其进程的能力,如 Docker,SandboxSETUID
允许容器设置有效的用户 ID(宿主机)SYS_CHROOT
允许容器使用 chroot 系统调用,即通过系统调用限制用户只能使用某个文件目录的能力
- Restricted Fields
-
HostPath Volumes: HostPath volumes must be forbidden.
- Restricted Fields
spec.volumes[*].hostPath
- Allowed Values
- undefined/nil
- Restricted Fields
-
Hosts Ports: HostPorts should be disallowed entirely (recommended) or restricted to a known list
- Restricted Fields
spec.containers[*].ports[*].hostPort
spec.initContainers[*].ports[*].hostPort
spec.ephemeralContainers[*].ports[*].hostPort
- Allowed Values
- undefined/nil
- [enforce,audit,warn]
- 0
- Restricted Fields
-
AppArmor: On supported hosts, the runtime/default AppArmor profile is applied by default. The baseline policy should prevent overriding or disabling the default AppArmor profile, or restrict overrides to an allowed set of profiles.
- Restricted Fields
- metadata.annotations[“container.apparmor.security.beta.kubernetes.io/*”]
- Allowed Values
- undefined/nil
- runtime/default
- localhost/*
- Restricted Fields
-
SELinux: Setting the SELinux type is restricted, and setting a custom SELinux user or role option is forbidden.
- Restricted Fields
spec.securityContext.seLinuxOptions.type
spec.containers[*].securityContext.seLinuxOptions.type
spec.initContainers[*].securityContext.seLinuxOptions.type
spec.ephemeralContainers[*].securityContext.seLinuxOptions.type
- Allowed Values
- undefined/""
container_t
容器的默认 SELinux 类型,允许主容器中的进程访问主容器中的资源,并收到 SELinux 策略的保护container_init_t
允许初始化容器中的进程访问初始化容器中的资源,并收到 SELinux 策略的保护container_kvm_t
允许容器内运行的虚拟机(如 KVM)的 SELinux 类型,并受到 SELinux 策略的保护
- Restricted Fields
spec.securityContext.seLinuxOptions.[user/role]
spec.containers[*].securityContext.seLinuxOptions.[user/role]
spec.initContainers[*].securityContext.seLinuxOptions.[user/role]
spec.ephemeralContainers[*].securityContext.seLinuxOptions.[user/role]
- Allowed Values
- undefined/""
- Restricted Fields
-
/proc Mount Type: The default /proc masks are set up to reduce attack surface, and should be required.
- Restricted Fields
spec.containers[*].securityContext.procMount
spec.initContainers[*].securityContext.procMount
spec.ephemeralContainers[*].securityContext.procMount
- Allowed Values
- undefined/nil
- Default
- Restricted Fields
-
Seccomp: Seccomp profile must not be explicitly set to Unconfined
- Restricted Fields
spec.securityContext.seccompProfile.type
spec.containers[*].securityContext.seccompProfile.type
spec.initContainers[*].securityContext.seccompProfile.type
spec.ephemeralContainers[*].securityContext.seccompProfile.type
- Allowed Values
- undefined/nil
- RuntimeDefault
- Localhost
- Restricted Fields
-
Sysctls: Sysctls can disable security mechanisms or affect all containers on a host, and should be disallowed except for an allowed “safe” subset. A sysctl is considered safe if it is namespaced in the container or the Pod, and it is isolated from other Pods or processes on the same Node.
- Restricted Fields
spec.securityContext.sysctls[*].name
- Allowed Values
- undefined/nil
kernel.shm_rmid_forced
控制是否强制删除共享内存标识符net.ipv4.ip_local_port_range
控制本地端口范围net.ipv4.ip_unprivileged_port_start
指定非特权用户可用的本地端口起始范围net.ipv4.tcp_syncookies
控制是否启用 SYN cookie 机制来防范 SYN 攻击net.ipv4.ping_group_range
指定 ping 命令可用的 ICMP Echo 请求net.ipv4.ip_local_reserved_ports
指定保留的本地端口范围net.ipv4.tcp_keepalive_time
指定 TCP 连接的 FIN 超时时间,以秒为单位net.ipv4.tcp_fin_timeout
指定 TCP 连接的 FIN 超时时间,以秒为单位net.ipv4.tcp_keepalive_intvl
指定 TCP 连接的 keepalive 控制消息之间的间隔时间,以秒为单位net.ipv4.tcp_keepalive_probes
指定 TCP 连接在进行 keepalive 检测之前尝试的次数
- Restricted Fields
-
-
Restricted
限制性最强的策略- Volume Types: The restricted policy only permits the following volume types.
- Restricted Fields
spec.volumes[*]
- Allowed Values: Non-Null value
spec.volumes[*].configMap
spec.volumes[*].csi
spec.volumes[*].downwardAPI
spec.volumes[*].emptyDir
spec.volumes[*].ephemeral
spec.volumes[*].persistentVolumeClaim
spec.volumes[*].protected
spec.volumes[*].secret
- Restricted Fields
- Privilege Escalation: Privilege escalation (such as via set-user-ID or set-group-ID file mode) should not be allowed. linux only policy (
spec.os.name != windows)
- Restricted Fields
spec.containers[*].securityContext.allowPrivilegeEscalation
spec.initContainers[*].securityContext.allowPrivilegeEscalation
spec.ephemeralContainers[*].securityContext.allowPrivilegeEscalation
- Allowed Values
- false
- Restricted Fields
- Running as Non-root
- Restricted Fields: Containers must be required to run as non-root users.
spec.securityContext.runAsNonRoot
spec.containers[*].securityContext.runAsNonRoot
spec.initContainers[*].securityContext.runAsNonRoot
spec.ephemeralContainers[*].securityContext.runAsNonRoot
- Allowed Values
- true
- Restricted Fields: Containers must not set runAsUser to 0
spec.securityContext.runAsUser
spec.containers[*].securityContext.runAsUser
spec.initContainers[*].securityContext.runAsUser
spec.ephemeralContainers[*].securityContext.runAsUser
- Allowed Values
- any non-zero value
- undefined/null
- Restricted Fields: Containers must be required to run as non-root users.
- Seccomp: Seccomp profile must be explicitly set to one of the allowed values. Both the Unconfined profile and the absence of a profile are prohibited. linux only (
spec.os.name != windows)
- Restricted Fields
spec.securityContext.seccompProfile.type
spec.containers[*].securityContext.seccompProfile.type
spec.initContainers[*].securityContext.seccompProfile.type
spec.ephemeralContainers[*].securityContext.seccompProfile.type
- Allowed Values
- RuntimeDefault
- Localhost
- Restricted Fields
- Capabilities: Containers must drop ALL capabilities, and are only permitted to add back the NET_BIND_SERVICE capability. linux only (
spec.os.name != windows)
- Restricted Fields
spec.containers[*].securityContext.capabilities.drop
spec.initContainers[*].securityContext.capabilities.drop
spec.ephemeralContainers[*].securityContext.capabilities.drop
- Allowed Values
- Any list of capabilities that includes ALL
- Restricted Fields
spec.containers[*].securityContext.capabilities.add
spec.initContainers[*].securityContext.capabilities.add
spec.ephemeralContainers[*].securityContext.capabilities.add
- Allowed Values
- undefined/nil
- NET_BIND_SERVICE
- Restricted Fields
- Volume Types: The restricted policy only permits the following volume types.
为名字空间设置 Pod 安全性准入控制标签
- enforce: 策略违例会导致 Pod 被拒绝,应用到 Pod 对象上
- audit:策略违例会触发在审计日志中记录新事件时添加注解;但是 Pod 仍然是被接受的,应用到 Deployment,ReplicaSet 等控制器对象上
- warn:策略违例会触发用户可见的警告信息,但是 Pod 仍是被接受的,应用到 Deployment,ReplicaSet 等控制器对象上
对应的标签
pod-security.kubernetes.io/<MODE>: <LEVEL>
MODE:enforce
,audit
,warn
LEVEL:privileged
,baseline
,restricted
pod-security.kubernetes.io/<MODE>-version: <VERSION>
MODE:enforce
,audit
,warn
VERSION: 合法的 kubernetes 小版本号或者latest
apiVersion: v1
kind: Namespace
metadata:
name: my-baseline-namespace
labels:
pod-security.kubernetes.io/enforce: baseline
pod-security.kubernetes.io/enforce-version: v1.29
# 我们将这些标签设置为我们所 _期望_ 的 `enforce` 级别
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/audit-version: v1.29
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/warn-version: v1.29
准入豁免
- Username:来自用户名一杯豁免的,已认证的(或伪装的)用户请求会被忽略
- RuntimeClassName:指定了已豁免的 CRI 类名称的 Pod 和负载资源(Deployment,ReplicaSet 等)会被忽略
- Namespace:位于北豁免的名字空间中的 Pod 和负载资源会被忽略
NOTE: 为用户提供豁免时,只会当该用户直接创建的 Pod 时对其实施安全策略的豁免。用户所创建的工作负载资源(控制器)不会被豁免。控制器服务账号(如:system:serviceaccount:kube-system:replicaset-controller)通常不应该被豁免,因为这类服务账号隐含着对所有能够创建对应工作负载资源的用户豁免。
策略检查时会对以下 Pod 字段的更新操作予以豁免,这意味着如果 Pod 更新请求进改变这些字段时,即使 Pod 违反了当前的策略级别,请求也不会被拒绝。
- 除了对 seccomp 或 AppArmor 注解之外的所有 Metadata 更新操作:
container.apparmor.security.beta.kubernetes.io/*
- 对
.spec.activeDeadlineSeconds
的合法更新 - 对
.spec.tolerations
的合法更新
Pod 安全级别的指标监控
- pod_security_evaluations_total: 表示易发生的策略评估的数量,不包括到处期间被忽略或豁免的请求
- pod_security_exemptions_total: 表示豁免请求的数量,不包括被忽略或超出范围的请求