系列概述
此系列博文专供Dataform用户,欲探其核心功能之外者。其间将深入Dataform之强效API,于GitHub Actions中构建自动化CI/CD流程,建管道以监控成本,并借代码改config {}之块。
Dataform之API,乃探究其强大而鲜有记载之能事之自然发端。通晓此API,则用户得以将Dataform作为自动化现代数据平台之全能枢纽而运用之。
Dataform API之要义
此API有二物,乃任一工作流程之要组成:
-
CompilationResult- 某一时点之数据表工作空间之编译状态也(乃构建之DAG所成)。 -
WorkflowInvocation- 单次执行之CompilationResult也(乃运行之DAG所成)。
CompilationResult
from google.cloud import dataform_v1
client = dataform_v1.DataformClient()
project_id = "my-project"
region = "europe-west2"
repository_id = "analytics"
workspace_id = "dev"
workspace = client.workspace_path(
project_id,
region,
repository_id,
workspace_id,
)
repository = client.repository_path(
project_id,
region,
repository_id,
)
compilation_result = dataform_v1.CompilationResult(
workspace=workspace
)
response = client.create_compilation_result(
parent=repository,
compilation_result=compilation_result,
)
print(response.name)
应答之形也projects/<project_id>/locations/<region>/repositories/<repository_id>/compilationResults/<compilation_result_id>(其UUID v4也,<compilation_result_id>是也)。
每见代码之变,则CompilationResult之象生焉,然UI隐其号。其号可见于Executions之页,既行之后也。
CompilationResult 之中,载有数据空间内诸务之信息,使用户得以编程游历有向无环图,并提取各文件之 config {} 块信息。继而,每务皆可循
request = dataform_v1.QueryCompilationResultActionsRequest(
name=COMPILATION_RESULT
)
response = client.query_compilation_result_actions(
request=request
)
for action in response.compilation_result_actions:
...
WorkflowInvocation
from google.cloud import dataform_v1
client = dataform_v1.DataformClient()
project_id = "my-project"
region = "europe-west2"
repository_id = "analytics"
repository = client.repository_path(
project_id , region, repository_id
)
# compilation_result comes from the previous snippet
invocation = dataform_v1.WorkflowInvocation(
compilation_result=compilation_result.name
)
response = client.create_workflow_invocation(
parent=repository,
workflow_invocation=invocation,
)
print(f'{response.name}: {response.state}')
WorkflowInvocation 每次执行运行时于 Dataform UI 生成之:
。WorkflowInvocation 包含已执行之每项作业信息,使用户得以将作业与 BigQuery 作业 ID 相系,并观其所创之物(如表或视图)。每项已执行之作业,可依:
request = dataform_v1.QueryWorkflowInvocationActionsRequest
name=WORKFLOW_INVOCATION
)
response = client.query_workflow_invocation_actions(
request=request
)
for action in response.workflow_invocation_actions:
...
之法遍历之。
Dataform之API,实乃合此二物而显其威。其常法若此:
- 工场或手更或依Git而新之,
- 自工场生
CompilationResult, - 自是编而生
WorkflowInvocation, - BigQuery乃行其成之图。
此二物足矣,可自动化构建与运行Dataform DAG,而无需依赖Dataform界面。由此得易行CI/CD之流程,后文当详述之。















