Complete Azure DevOps CI/CD guide for your Azure Synapse based Data Platform – part II
I wrote an article recently describing how to implement Continuous Integration and Continuous Deployment (CI/CD) for your Azure Synapse based data platform. At the end of part one we setup the infrastructure required for CI/CD, and we did our first parameterized deployment from our development environment to the production environment.
In part two we are going to focus on specific use cases to make our CI/CD process completer and more robust.
Situation
We have just setup our first Azure DevOps CI/CD pipeline for our Azure Synapse based Data Platform (Preferably with the help of part 1). It does the job well, but we run into some limitations that are that not immediately obvious to solve. Specifically, two uses cases; what to do with our triggers that should behave differently between our development and production environment, and what to do with our serverless SQL databases which is currently not easy to deploy to production.
Dynamically adjust triggers between your development and production environment
It is likely that you want different triggers running your Synapse pipelines in the development and production environment. Azure Synapse has the functionality to start and stop triggers, meaning when a trigger is stopped it will not run the pipeline it is attached to. This gives us the opportunity to start and stop certain triggers in development and production. This is important because of multiple reasons. For example, we do not want to run our development pipelines at the same time as our production pipelines because of workload constraints.
Luckily, we can automate the process of starting and stopping triggers. We can toggle the triggers on or off in our Azure DevOps pipeline, we do this by adding an extra task to our YAML file. The following code will stop all the triggers in our development environment.
- task: toggle-triggers-dev@2
displayName: 'Toggle all dev triggers off'
inputs:
azureSubscription: '${{ parameters.subscriptionDev }}'
ResourceGroupName: '${{ parameters.resourceGroupNameDev }}'
WorkspaceName: '${{ parameters.synapseWorkspaceNameDev}}'
ToggleOn: false
Triggers: '*'
Code snippet 1: Stopping all triggers in dev
It is worth paying attention to the specific value for “Triggers”. In the example above we give it the wildcard value “’*’” which means that all the triggers in development environment will be stopped. We could also give it some hard coded names of triggers that we want to stop, as seen in code snippet 2.
- task: toggle-triggers-dev@2
displayName: 'Toggle specific triggers off'
inputs:
azureSubscription: '${{ parameters.subscriptionDev }}'
ResourceGroupName: '${{ parameters.resourceGroupNameDev }}'
WorkspaceName: '${{ parameters.synapseWorkspaceNameDev}}'
ToggleOn: false
Triggers: 'trigger1,trigger2,trigger3'
Code snippet 2: Stop specific triggers
In this example, only the triggers called trigger1, trigger2, and trigger3 will be stopped. This solves some of our problems, because we can now toggle the production triggers in the production environment by adding them to the list. The same applies to the development environment. However, this would mean that we always need to adjust our Azure DevOps pipeline when we create new triggers, which is not an ideal situation. That is why we need to go one step further and automate this process.
We start with naming conventions for our triggers. We name every trigger in our development environment that needs to be started “_dev” and every trigger in our production environment that’s needs to be started “_prd”. Now we have something that we can use to recognize the environment the trigger is meant for. Unfortunately, at the time of writing this article, it is not possible to use some kind of string recognition in our “Triggers” parameter, we need to give it exact names of the triggers. That is why we need a workaround.
In order the get the correct list of triggers, we are going to use a PowerShell script which we will run in the Azure DevOps pipeline in replacement of the task we described above.
$triggers = az synapse trigger list `
--workspace-name ${{ parameters.synapseWorkspaceNameDev}} `
--query "[].name"
foreach ($trigger in $triggers) {
if ($trigger.Contains("_dev")) {
$trigger = $trigger.Trim() -replace '[\W]', ''
az synapse trigger start `
--workspace-name ${{ parameters.synapseWorkspaceNameDev }} `
--name $trigger
}
}
Code snippet 3: Starting triggers dynamically in development
First, we create a variable named “triggers” which will contain a list of all the triggers in our development environment. We achieve this by using the Azure CLI. Make sure to add “—query “[].name”” to only get the names of the triggers. Next, we are going to loop over this list with a for each loop and check for every trigger in the list if it contains “_dev”. If this is the case, we make sure that there is no whitespace in the name of the trigger and then we run an Azure CLI command to start this trigger. This way all the triggers in with “_dev” in their name will be started.
We run this PowerShell script using an Azure CLI task in our YAML file.
#Start triggers in dev synapse environment
- task: AzureCLI@2
displayName: 'Toggle _dev triggers on'
continueOnError: false
inputs:
azureSubscription: '${{ parameters.subscriptionDev }}'
scriptType: pscore
scriptLocation: inlineScript
inlineScript: |
$triggers = az synapse trigger list `
-–workspace-name ${{ parameters.synapseWorkspaceNameDev}} `
--query "[].name"
foreach ($trigger in $triggers) {
if ($trigger.Contains("_dev")) {
$trigger = $trigger.Trim() -replace '[\W]', ''
az synapse trigger start `
--workspace-name ${{ parameters.synapseWorkspaceNameDev }} `
--name $trigger
}
}
Code snippet 4: Azure DevOps pipeline task to run PowerShell script
The same can be done for our production environment. By using the naming conventions in combination with the PowerShell script we can now automatically and dynamically start and stop triggers in our Azure Synapse environments. We do not need to manually add triggers to our Azure DevOps pipeline, but we only need to stick to our naming conventions, which will hopefully result in less bugs.
How to deal with your serverless SQL pool?
At the moment of writing this blog Microsoft does not have an out of the box solution for automatically deploying our serverless databases and associated external tables from our development to our production environment. Therefore, we use a pragmatic and quite easy solution to solve this limitation.
In this example we are going to focus on creating external tables on our data which we need in our production environment. The external tables are not automatically deployed using the Azure DevOps pipeline, therefore we need a workaround. We can do this by manually creating a migration pipeline in our development Synapse workspace.
Figure 1: Example of migration pipeline
As an example, I have created three Script activities in the pipeline which all contain a SQL script to create external tables on the existing data. We already ran these scripts in our development environment, but we want to also run them in our production environment.
Since we do not want to do this manually, we need to find a way to automate this. We will do this by adding the following task at the end of our Azure DevOps deployment pipeline.
#Trigger migration pipeline
- task: AzureCLI@2
condition: eq('${{ parameters.triggerMigrationPipeline }}', 'true')
displayName: 'Trigger migration pipeline'
inputs:
azureSubscription: '${{ parameters.subscriptionPrd }}'
scriptType: pscore
scriptLocation: inlineScript
inlineScript: |
az synapse pipeline create-run `
--workspace-name ${{ parameters.synapseWorkspaceNamePrd }} `
--name ${{ parameters.migrationPipelineName }}
Code snippet 5: Azure DevOps pipeline task to trigger migration pipeline
After the synapse workspace is fully deployed to our production environment, this task will trigger the migration pipeline we just created. We run the migration pipeline by executing an Azure CLI command “az synapse pipeline create-run” where we specify the Synapse workspace name and the name of the migration pipeline. By running this pipeline in our Synapse production environment, we ensure that the external tables are created in the production serverless SQL pool.
As you can see, we added a condition to the task which states this task will only run if our parameter “triggerMigrationPipeline” is set to “true”. By adding this to our CI/CD pipeline, we can trigger the migration pipeline in our production environment only if we want it to.
In the example above we focused on creating external tables in the serverless SQL database, but the migration pipeline can be used for multiple purposes. For instance, if we have pipelines which need a special initialization to run properly, we can put those initialization activities in our migration pipeline. For example, creating a stored procedure that is used in our pipeline. In summary, we can put all the activities we need to make sure that the triggered pipelines are going to run without any errors in the migration pipeline to make sure everything is initialized.
If you are looking for a more complete version of implementing CI/CD for your serverless SQL pool, you can check out this blog by Kevin Chant, in which he uses a .NET library called DbUp.
Summary
The use cases discussed above will make your CI/CD process more dynamic and robust. By dynamically adjusting triggers between your environments and adding a migration pipeline to migrate your serverless databases, you will have less manual work when deploying to your production environment. Of course, there are more automation possibilities and uses cases to further enhance your CI/CD process for your Azure Synapse based data platform, and with the development of the Microsoft stack, we will get new possibilities to make our lives easier.
If you would like more in-depth code or contribute yourself, check out: https://github.com/atc-net/atc-snippets/tree/main/azure-cli/synapse/Publish.