Job 总结
默认启动的 Job(非并发)
- 通常只启动一个 Pod,除非该 Pod 失败
- 当 Pod 成功终止时,立即视 Job 为完成状态
关键指标:
spec.completions = 1
默认值为 1spec.parallelism = 1
默认值为 1
并行的 Job
- 具有确定完成计数的并行 Job
spec.completions
> 0,可以设置或者不设置spec.parallelism
,默认值为 1- Completed Job,当成功 (exitCode = 0) Pod 个数达到
spec.completions
spec.completionMode = Indexed
, Job index (0-spec.completions
-1)
- 带工作队列的并行 Job
- No
spec.completions
, default isspec.parallelism >= 0
,and must setspec.parallelism
- Coordinated Pod,or specify Pod do which item(s)
- Pod know its peer Pods are completed or not,to determine Job completed or not
- No more new Pods, when Pod terminated successfully
- Job completed when Pod (>=1 terminated) and other Pods terminated successfully
- Pod exited successfully, no other Pod will keep doing this task and will be in exiting process.
- No
控制并行:
spec.parallelism >= 0
,如果为 0,则 Job 相当于启动之后立即被暂停。实际在任意状态运行的 Pod 个数可能比spec.parallelism
略大或略小,原因如下:
- For fixed completion count Job,paralleled Pod number <= remaining completion count
- For work queue Job,有任何的 Job 成功结束之后,不会有新的 Pod 启动,对于已经运行的 Pod,允许执行完毕
- 如果 JobController 没来得及做出响应,或者 JobController 因为任何原因(如资源不足,缺少 ResourceQuota 或者没有权限)无法创建 Pod。则 Pod 个数可能比请求的数目小
- JobController 可能因为之前同一 Job 中 Pod 失败次数过多而压制新 Pod 的创建
- 当 Pod 处于体面终止进程中,需要一定时间才能停止
Job Completion Mode
spec.completions
> 0 && spec.completionMode
in (NonIndexed
, Indexed
)
- NonIndexed (default)
- Indexed, get this index value through four mechanisms
- Pod annotation
batch.kubernetes.io/job-completion-index
- When
PodIndexLabel
(default enabled) feature gate enabled, Pod labelbatch.kubernetes.io/job-completion-index
(>= v1.28). - Pod hostname, $(job-name)-$(index). When use an IndexedJob in combination with a Service, Pods within the Job can use the deterministic hostnames to address each other via DNS. Job with Pod-to-Pod Communication
- For the containerized task, use environment variable
JOB_COMPLETION_INDEX
- Pod annotation
Reference: http://kubernetes.io/docs/concepts/workloads/controllers/job/#completion-mode
Handing Pod and container failures
- User should handle
spec.template.spec.restartPolicy = OnFailure
, or setspec.template.spec.restartPolicy = Never
- User should handle temporary files, locks, incomplete output by yourself
- Each Pod failure is counted in
spec.backoffLimit
. - Count Pod failure in
spec.backoffLimitPerIndex
- Set
spec.parallelism = 1
&&spec.completions = 1
&&spec.template.spec.restartPolicy = Never
can not make sure the program run one time. - User should handle currency for
spec.parallelism > 1
&&spec.completions > 1
- Feature gate
PodDisruptionCondition
andJobPodFailurePolicy
enabled andspec.podFailurePolicy
be set, JobController will not treat Pod withmetadata.deletionTimestamp
as a failure Pod, until the Pod terminated (.status.phase
inFailure
Success
). Once the Pod terminated, the JobController evaluates.backoffLimit
and.podFailurePolicy
for relevant job, consider this now-terminated Pod failed or not. - No above situation suited, JobController counts a terminating Pod as an immediate failure, even through that Pod terminates with
phase = Succeed
later.
Pod backoff failure policy
Set spec.backoffLimit = X
, when Job retries count is X
, the Job is marked failure.
Default spec.backoffLimit = 6
,backoff time (10s, 20s, 40s) until 6m.
status.phase = Failed
restartPolicy = OnFailure
&&status.phase
in (Pending
,Running
)
Official suggestion:
restartPolicy = "Never"
and using a logging system to record logs.
Backoff limit per index
Feature gate JobBackoffLimitPerIndex
should be enabled.
- Set
spec.backoffLimitPerIndex
for handling retries for Pod failure - Failed Job index will be added in
status.failedIndexes
. Completed Job index will be added instatus.completedIndex
, regardless ofbackoffLimitPerIndex
field - A failing index Job does not interrupt execution of other indexes. If one index Job failed, the overall IndexedJob will be marked failed.
- JobController will terminate the entire job failed by setting
spec.maxFailedIndexes
, including the running Pods for that Job.
Pod failure policy
Feature gate JobPodFailurePolicy
enabled, recommended PodDisruptionConditions
enabled, supported in v1.29.
spec.podFailurePolicy
enables k8s cluster to handle Pod failures based on the container exit codes and the Pod conditions.
A better control for handling Pod failures than the Pod backoff failure policy, based on spec.backoffLimit
.
- For avoiding unnecessary Pod restarts
- Guarantee Job and ignore Pod failures caused by disruptions (eg. preemption, API-initiated eviction or taint-based eviction) so that don’t count
spec.backoffLimit
Note: Because the Pod template specifies a restartPolicy: Never, the kubelet does not restart the main container in that particular Pod.
Ignore action for failed Pods with condition DisruptionTarget
excludes Pod disruptions from being counted towards spec.backoffLimit
Note: If the Job failed, either by the Pod failure policy or Pod backoff failure policy, and the Job is running multiple Pods, Kubernetes terminates all the Pods in that Job that are still Pending or Running.
API requirements and semantics:
- Must define
spec.template.spec.restartPolicy = Never
forspec.podFailurePolicy
spec.podFailurePolicy.rules
are evaluated in order. Once a rule matches a Pod failure, the remaining rules are ignored.spec.podFailurePolicy.rules[*].onExitCodes.containerName
available for both containers and initContainersspec.podFailurePolicy.rules[*].action
- FailJob
- Ignore: relevant with
spec.backoffLimit
- Count: relevant with
spec.backoffLimit
- FailIndex: relevant with backoff limit per index
Reference: http://kubernetes.io/docs/concepts/workloads/controllers/job/#pod-failure-policy
Job termination and cleanup
- A Job will be interrupted with a Pod
restartPolicy = Never
or a Container exits inrestartPolicy = OnFailure
. Oncespec.backoffLimit
be satisfied, the entire Job will be marked as failed and any running Pods will be terminated. spec.activeDeadlineSeconds
be satisfied, all of its running pods are terminated, and the Job status will becausetype: Failed
withreason: DeadlineExceeded
.spec.activeDeadlineSeconds
takes precedence overspec.backoffLimit
. Once the Job reaches the time limit (activeDeadlineSeconds
), even if thebackoffLimit
is not yet reached.
Cleanup finished jobs automatically (v1.23 stable)
- TTL mechanism,
spec.ttlSecondsAfterFinished
for cleaning up finished Jobs (Completed
,Failed
), including all the cascading Objects, eg: Pods.
Note: If the Job do not be cleaned up, the cluster performance degradation or in worst case cause to go offline due to this degradation. Use
LimitRanges
andResourceQuotas
is a better way to avoid this.
Reference: http://kubernetes.io/docs/concepts/workloads/controllers/job/#ttl-mechanism-for-finished-jobs
Some examples:
- Specify this field in Job manifest
- Manually set this field of existing, already finished Jobs
- Use a mutating admission webhook to set this field dynamically at Job creation time. Cluster administrators use cases.
- Use a mutation admission webhook to set this field dynamically after the Job has finished, need to detect
status
of the Job. - Write your own controller to manage the cleanup TTL for Jobs.
Caveats:
- Updating TTL for finished Jobs: K8s will not make sure the update for TTL with have been expired.
- Time skew: Known that clocks aren’t always correct, K8s Job use the timestamp for doing the clean up.
Job patterns
Usage cases: like emails to be sent, or notification to be pushed, or frames to be rendered, or files to be transcoded, ranges of keys in a NoSQL to scan…
Different patterns for parallel computation, each with strengths and weaknesses. The tradeoffs are:
- A single job for all work items is better for large numbers of items.
- Having each Pod process multiple work items is better for large numbers of items.
- Several approaches use a work queue.
- The job is associated with a headless Service.
Reference: http://kubernetes.io/docs/concepts/workloads/controllers/job/#job-patterns
Advanced Usage
- Suspending a job:
spec.suspend = true
(v1.24 stable) - Mutable Scheduling Directives (v1.27 stable)
- Specifying your own Pod selector:
spec.selector
- Job tracking with finalizers:
batch.kubernetes.io/job-tracking
(v1.26 stable) - Elastic Indexed Jobs (v1.27 beta)
- When feature gate
ElasticIndexedJob
disabled,spec.completions
immutable spec.parallelism
spec.completions
- When feature gate
- Delayed creation of replacement pods (v1.29 beta)
- Feature gate
JobPodReplacementPolicy
enabled by default - Use
status.phase = Failed
for delaying Pods creation of replacement, setspec.podReplacementPolicy = Failed
- Without
podFailurePolicy
set,podReplacementPolicy
selects theTerminatingOrFailed
replacement policy: the control plane creates replacement Pods immediately upon Pod deletion (as soon as the control plane sees that a Pod for this Job hasdeletionTimestamp
set).
- Feature gate
Alternatives
- Bare Pods
- Replication Controller
- Single Job starts controller Pod
Job usage conclusion (personally)
用户需要自己处理一些如 lock,重试,标记,验证等业务逻辑,来安全的使用 Job,JobController 并不会减轻开发工作量,因为 JobController 中的 Pod 会因为很多原因重启或失败,比如 Node eviction,比如 livenessProbe。
JobController 的优势,在于可以可控制的扩缩并行任务数量