Azure Synapse Analytics(workspaces): Deploy and Debug - Part 1
Jayendran Arumugam
Posted on August 21, 2020
Disclaimer:
This post is provided "as-is".
Information and views expressed in this post, including URL and other Internet Web site references, may change without notice. Some examples depicted herein are provided for illustration only and are fictitious. No real association or connection is intended or should be inferred. This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes.
Setting up the Definitions clear
Azure Synapse Analytics
Azure Synapse is an analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources-at scale
Azure Synapse Analytics Workspace (Preview)
Azure Synapse comes with a web-native Studio user experience that provides a single experience and model for management, monitoring, coding, and security called synapse analytics workspace.
(As of writing this post, Azure synapse Analytics workspace is in preview)
If you are familiar with Azure Data Platform, I can simply put synapses workspace in a single pic like below 😉
Roadmap of Azure Synapse Analytics
What we are going to see in this Post?
We can create the synapses workspace which is a public preview from azure portal easily. However currently, there are no official docs yet available that can give detailed steps for creating synapses workspace programmatically using ARM. In this post (part 1), we are going to see how we can deploy azure synapses from the ARM template using service principal, deployment architecture, the different levels of access, and conditions.
I choose this topic because right now most of the docs are still under development, I faced a lot of troubles initially, even I created few issues and PR. So this post it's just a matter of sharing my new knowledge to others 😊
Creating Synapses Workspace through ARM Template using SPN Failing during provisioning "storageRoleDeploymentResource" with BAD Request #60705
Is there any Specific permissions needed to include while deploying a Synapses workspace through ARM with SPN Authentication?
ARM will getting failed at `storageRoleDeploymentResource' Resource provisioning state with BAD request
Also the sql admin was randomly assigned with a GUID. Do we need any special permission to include in the doc to cover the SPN Deployment ?
Doc Link: https://docs.microsoft.com/en-us/azure/synapse-analytics/security/how-to-set-up-access-control
Document Details
- ID: 1149f92f-5b95-7b27-5525-db4f792d7937
- Version Independent ID: 60242818-539c-6e91-7dc8-222ccc383574
- Content: Secure your Synapse workspace (preview) - Azure Synapse Analytics
- Content Source: articles/synapse-analytics/security/how-to-set-up-access-control.md
- Service: synapse-analytics
- Sub-service: security
- GitHub Login: @matt1883
- Microsoft Alias: mahi
Synapses Creating Big Data Pool validation error: Parameter 'BigDataPoolResourceInfo.node_count' failed to meet validation requirement. #14708
az feedback
auto-generates most of the information requested below, as of CLI version 2.0.62
Describe the bug Running the command of creating bigdata pools for azure synapses getting the below error validation error: Parameter 'BigDataPoolResourceInfo.node_count' failed to meet validation requirement.
To Reproduce Download latest az cli and running the below command which was the example given at here
az synapse spark pool create --name testpool --workspace-name testsynapseworkspace --resource-group rg \
--spark-version 2.4 --node-count 3 --node-size Medium
Expected behavior
It should create a Big data pool
Environment summary
Windows 10
Additional context
Synapses Data Plane API Audience should remove Trailing #10500
Am not exactly sure that this is a bug in the code itself or docs need to be updated. Either way, I'm creating this issue for both the case
Data Plane APIs Synapses Audience need to remove extra /
As per the below docs
Set the Authorization header to a JSON Web Token that you obtain from Azure Active Directory. For data-plane operations, be sure to obtain a token for the resource URI / audience claim "https://dev.azuresynapse.net/", NOT "https://management.core.windows.net/" nor "https://management.azure.com/". For more information, see Acquire an access token.
From the above statement has the audience claim https://dev.azuresynapse.net/ is not working, and will give an error like
{
"code": "InvalidTokenAuthenticationAudience",
"message": "Token Authentication failed with SecurityTokenInvalidAudienceException - IDX10214: Audience validation failed. Audiences: '[PII is hidden]'. Did not match: validationParameters.ValidAudience: '[PII is hidden]' or validationParameters.ValidAudiences: '[PII is hidden]'."
}
But once we removed the trailing slash at the end of the URL (/) which will be https://dev.azuresynapse.net then the APIs are working.
Please make necessary fix either in the docs or the API
Reference: https://docs.microsoft.com/en-us/rest/api/synapse/?source=docs
Fixed the Address/Ip Typo #12622
Fixed the Address/Ip Typo
Description
Checklist
- [x] I have read the Submitting Changes section of
CONTRIBUTING.md
- [x] The title of the PR is clear and informative
- [x] The appropriate
ChangeLog.md
file(s) has been updated:- For any service, the
ChangeLog.md
file can be found atsrc/{{SERVICE}}/{{SERVICE}}/ChangeLog.md
- A snippet outlining the change(s) made in the PR should be written under the
## Upcoming Release
header -- no new version header should be added
- For any service, the
- [x] The PR does not introduce breaking changes
- [x] If applicable, the changes made in the PR have proper test coverage
- [x] For public API changes to cmdlets:
- [x] a cmdlet design review was approved for the changes in this repository (Microsoft internal only)
- [x] the markdown help files have been regenerated using the commands listed here
Getting ARM Template from Azure Portal
We can easily grab the template for the synapses workspace template from the Azure portal itself.
Step 1:
Step 2:
After getting the ARM template will look like the below
ARM Parameters,
Architecture of Synapses Workspace:
Here I'm just giving a very simple & high-level architecture image for understanding the synapses workspace components.
As you see synapses workspace itself consist of a storage account gen2 and default on-demand SQL pool, it can be accessed by 3 different roles (NOT RBAC roles) called
- Workspace Admin
- SQL Admin
- Spark Admin
I'll explain these roles in part 2. As of now just assume its a role needed for any user to access the workspace.
ARM Graphical Viewer
We can easily understand this ARM using VS Code + ARM Template View extension. Here the final result will look like
Great! now we understood that synapses workspace needs a storage account(gen2) and a container (gen2filessystem) in it. The ARM also has some other components like roleassignments, managedidentitysqlcontrolsettings, and firewall. which are basically for giving correct permissions for our workspace, we will look more about these in the below sections.
Analyzing Parameters from ARM
Most of the parameters are self-explanatory, however, some of them depend on some high privilege permission. Let's see those
- setWorkspaceIdentityRbacOnStorageAccount : If true, this will assign the role of the workspace(MSI) as the storage blob contributor to the existing or the new storage account. This needs Microsoft.Authorization/roleAssignments/write permission which requires owner role or at-least User Access Administrator. So make sure you give owner/User Access Administrator access to your SPN if you set this true.
The below table will help you to understand this parameter based on your SPN access.
grantWorkspaceIdentityControlForSql :
Grant CONTROL to the workspace's managed identity on all SQL pools and SQL on-demandisNewFileSystemOnly: If the storage account new/exist but when we need to create a new filesystem, use this variable to true
setSbdcRbacOnStorageAccount : If we need to enable the user, (whose object id will be provided in userObjectId) as the Storage Blob contributor to the Storage account gen2.
This is a nested task which depends on setWorkspaceIdentityRbacOnStorageAccount parameter, i.e., this will be executed only if you provide setWorkspaceIdentityRbacOnStorageAccount as true.
E.g If you provide setWorkspaceIdentityRbacOnStorageAccount as false and even if you provide setSbdcRbacOnStorageAccount as true it won't affect anything.
Conclusion
I hope you get some additional information about the synapse workspace from this post. I'll explain more about API's, security best practice in Part 2
Posted on August 21, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.