Google Cloud BigQuery Job Step Window

The Step tab is used to specify SQL statements, define step definitions, etc. It contains the data:

Step Name – Name of the step. You can use variables in this field.
Step Type – Type of the step. Choose the required type from the dropdown menu (Query, Copy Tables, Create Model, Extract Model to File, Extract Table to File, Load File to BigQuery).

Note: Follow Basic roles and permissions to determine the permissions required to correctly perform tasks.

The Execution Flow tab is used to:

Define subsequent actions following either successful or failed step completion.
Defines the maximum number of times a step is allowed to run. Note that you can use this feature to break the execution sequence of a set of steps defined in the Google Cloud BigQuery job.

This table explains an action:

Action	Description
On Success	Action Upon step completion, choose from these subsequent actions: Go to the next step. Quit the step flow, and report success. Quit the step flow, and report failure.
Failure Action	When an operation fails, choose one of these actions: Go to the next step. Quit the step flow and report success. Quit the step flow and report failure. Go to the first operation (Operation #1).
Max Allowed Runs	The number of times the Adapter can run a step. When the limit is reached, steps do not run, and the Databricks Spark SQL job fails with a status of Completed Abnormally.

Action

Description

On Success

Action Upon step completion, choose from these subsequent actions:

Go to the next step.
Quit the step flow, and report success.
Quit the step flow, and report failure.

Failure Action

When an operation fails, choose one of these actions:

Go to the next step.
Quit the step flow and report success.
Quit the step flow and report failure.
Go to the first operation (Operation #1).

Max Allowed Runs

The number of times the Adapter can run a step. When the limit is reached, steps do not run, and the Databricks Spark SQL job fails with a status of Completed Abnormally.

This table explains the behavior of each type of an action:

Action	Description
Go to the next step	Ensures that the next step in the set of defined steps is executed. If no next step is specified, then the sequence of step execution is terminated, and the Databricks Spark SQL job completes with the status Completed Normally or Completed Abnormally based on the completion status of the last step.
Quit the step flow, report success	Ensures that after the step executes, no other step executes and, at the same time, the Databricks Spark SQL job completes with the status Completed Normally.
Quit the step flow, report failure	Ensures that after the step executes, no other step is executed and, at the same time, the Databricks Spark SQL job completes with the status Completed Abnormally.

Query step type

The Query step type allows you to execute DML (Data Manipulation Language). It is possible to execute the DDL statements as well, but we recommend using the Data Definition step type for this purpose. Follow Using Data manipulation language to find more information.

Once you choose this type, these fields are displayed:

Dataset – The dataset name. This field is optional. If this field is blank, specify the dataset in the query. For example, SELECT * FROM `testAdapterDataSet.Userdata` LIMIT 1000, where testAdapterDataSet is the dataset name.

Note: Only one SQL statement can be executed per step.

Dry run checkbox – Runs a dry run query to estimate the number of bytes read by the query.
SQL tab – The text area to specify the statement to be executed.

You can provide only one SQL statement. The format of query results appears as output. Values include Align columns, Inherited (selected by default), CSV, JSON, XML, and Raw formats.

If you choose Align columns format, note the JVM heap memory size. For these formats, the data is stored in the JVM heap memory. As a result, there potentially could be issues pertaining to running out of memory. To increase JVM heap memory, update the -Xmx256m option.
Parameters tab – Optional parameter name and value pairs. The parameter is defined as “:” followed by a parameter name.

To add a value to the parameter:

Double-click the parameter.
Click Edit and enter a parameter value or click Variables and choose one to insert a dynamically replaced variable value.
Click OK to save it. The value displays in the variable row.

To add variables inside the query, click Variables.

Output format – Choose the step output format. Options include Align columns, Inherited (selected by default), CSV, JSON, XML, and Raw formats.

Note: If you choose Align columns format, note the JVM heap memory size. For this format, the data are stored in the JVM heap memory. As a result, there could be potential issues pertaining to running out of memory. To increase JVM heap memory, update the -Xmx256m option.

Delimiter – For Raw format only, the custom character is used to delimit query results. If a delimiter isn't specified, the default delimiter '|' is used.

Include Header – The option display column headers for results. This option is available for Align columns, CSV and Raw formats.

Variables – Specify a variable. This field is enabled only for the Step name and Dataset fields.

The Copy Tables Step Type

The Copy Tables step type allows you to run one of these operations: Copy (Merge), Clone, Snapshot, or Restore.

Note: Follow Required permissions to find out about the permissions you need to have to perform the tasks correctly.

Once you choose this type, these fields are displayed:

Note: At least one source table should be specified for any operation type. Only the copy (merge) operation type supports multiple source tables. The clone, snapshot, and restore operation types support only one source table.

Operation type – Choose the type from the dropdown list (Clone, Merge, Restore, Snapshot).

Source – The location of the source dataset. Click Add to add the Dataset and Table.

Dataset – The dataset name.

Table – The source table.

Destination – The location of the destination dataset.

Dataset – The destination dataset. This field is required.

Table – The destination table. This field is required.

Note: The source and destination tables must have the same data schema.

Create Option – You can choose one of these values:

Create if needed (CREATE_IF_NEEDED). Creates target table if it does not exist.

Never create (CREATE_NEVER). Throws an exception if target table does not exist.

The snapshot operation type does not support Never create option.

Load Option – Options for loading. Choose one of these values:

Overwrite (WRITE_TRUNCATE). Overwrites the existing table with the new table.

Append (WRITE_APPEND). Appends data to the existing table.

Error if not empty (WRITE_EMPTY). Appends data to the empty table only. Otherwise, errors are generated.

Note: The snapshot operation type supports only the WRITE_EMPTY load option.

The restore and clone operation types do not support the WRITE_APPEND load option.
- KMS encryption key – This field is optional. It specifies custom encryption configuration (e.g., Cloud KMS keys).
  
  Note: Only the clone and copy (merge) operations support override encryption.
- Variables – Specify a variable. This field is disabled for the KMS encryption key field.

The Create Model Step Type

The Create Model step type allows you to execute create model statements. Once you choose this type, these fields are displayed:

Dataset – The dataset name. This field is optional. If this field is blank, specify the dataset in the query. For example, CREATE MODEL `testAdapterDataSet.test_model` OPTIONS (…, where testAdapterDataSet is the dataset name.

Note: Only one SQL statement can be executed per step.

Dry run checkbox – Runs a dry run query to estimate the number of bytes read by the query.

SQL tab – The text area to specify the statement to be executed.

Note: You can provide only one SQL statement.

The format of query results appears as output. Values include Align columns, Inherited (selected by default), CSV, JSON, XML, and Raw formats.

Parameters tab – Optional parameter name and value pairs. The parameter is defined as “:” followed by a parameter name.

To add a value to the parameter:

Double-click the parameter.
Click Edit and enter a parameter value or click Variables and choose one to insert a dynamically replaced variable value.
Click OK to save it. The value is displayed in the variable row.

To add variables inside the query, click Variables.

Output format – Choose the step output format. Options include Align columns, Inherited (selected by default), CSV, JSON, XML, and Raw formats.
Variables – Specify a variable. This field is enabled only for the Step name and Dataset fields.

The Data Definition Step Type

The Data Definition step type allows the execution of DDL statements.

Once you choose this type, these fields are displayed:

Dataset – The dataset name. This field is optional. If this field is blank, specify the dataset in the query.

Note: Only one SQL statement can be executed per step.
Dry run checkbox – Runs a dry run query to estimate the number of bytes read by the query.
SQL tab – The text area to specify the statement to be executed.

Note: You can provide only one SQL statement.

The format of query results appears as output. Values include Align columns, Inherited (selected by default), CSV, JSON, XML, and Raw formats.

Parameters tab – Optional parameter name and value pairs. The parameter is defined as “:” followed by a parameter name.

To add a value to the parameter:
1. Double-click the parameter.
2. Click Edit and enter a parameter value or click Variables and choose one to insert a dynamically replaced variable value.
3. Click OK to save it. The value is displayed in the variable row.
  
  To add variables inside the query, click Variables.

Output format – Choose the step output format. Options include Align columns, Inherited (selected by default), CSV, JSON, XML, and Raw formats.
Variables – Specify a variable.

The Extract Model to File

The Extract Model to File step type allows you to export a BigQuery Model to a file.

Once you choose this type, these fields are displayed:

Note: To export a model, you need a set of permissions. Follow the Required Permission chapter to find more information.

Dataset – The dataset name. If this field is blank, you should specify the dataset in the query.

Model – The BigQuery model.

Bucket Name – The name of the Google Cloud Storage container that stores data.

Storage Path – The path in Google Cloud Storage where the file is created and stored.

Variables – Specify a variable. This field is enabled only for the Step name and Dataset fields.

The Extract Table to File Step Type

The Extract Table to File step type allows you to export a BigQuery table to a file.

Once you choose this type, these fields are displayed:

Note: To extract a table, you need a set of permissions. Follow the Required Permission chapter to find more information.

Dataset – The dataset name. If this field is blank, specify the dataset in the query.
Table – The Google Cloud BigQuery table.
File type – The format of the file for importing a table: Avro, CSV, JSON, Parquet.
Compression Type – The supported compression types: gzip, none, or snappy.
Bucket Name – The name of the Google Cloud Storage container that stores data.
Storage Directory – The dataset location.
Output File Name – The name of the output file.
Variables – Specify a variable.

The Load File to BigQuery Step Type

The Load File to BigQuery step type allows you to load data into BigQuery.

Once you choose this type, these fields are displayed:

Storage File Path – The path in Google Cloud Storage where the file is created and stored.
File Type – The format of the file for importing a table: Avro, CSV, JSON, ORC, Parquet.
Dataset – The dataset name. If this field is blank, you should specify the dataset in the query.
Table – The Google Cloud BigQuery table.
Load Option – Options for loading. You can choose one of these values:

Overwrite (WRITE_TRUNCATE). Overwrites the existing table with the new one.

Append (WRITE_APPEND). Appends data to an existing table.

Error if not empty (WRITE_EMPTY). Appends data to the empty table only, otherwise generates errors.

Variables: Specify a variable.

Monitoring a Google Cloud BigQuery Job activity

Use the Job Details dialog to monitor Google Cloud BigQuery jobs.

To monitor job activity:

Click Operations > Job Activity on the Navigation pane to display the Job Activity pane.
Right-click a job and choose Details from the menu.

The Job Details dialog displays. On the Status page, view the current status of the job, start and end times, the job's runtime duration and how the job was scheduled. The last executed step status is the final Job execution status:
Click the Output tab to view the job output.
Click the GC BigQuery Job tab to view the original request and the variables used when this job was submitted.The GC BigQuery Job tab is an editable copy of the GC BigQuery tab in the Job Definition screen (except the Steps table), where a user can specify new configuration data and rerun a job.

From the dropdown list, choose Start from Step # to change the job run definition and rerun a specific job.

Fields are disabled while the job runs; overrides are not permitted when the job is running.

Note: Changes made here apply only to this job instance (the original job definition is not affected).

Click the Run Info tab to view the Google Cloud BigQuery job-related information. This is a read-only copy of the GC BigQuery Job tab in the Job Details screen. It contains resolved variables in the SQL tab of the Job Steps.

Fields include:
- Step# – The step number.
- Name – The name of step.
- On Success – The action to occur if the step is completed successfully.
- On Failure – The action to occur if the step fails.
- Run# – The number of times that a job has run already.
- Status – The success or failure status of the job.
- View Step – This button displays the GC BigQuery Job Step dialog.