Azure Synapse Analytics(workspaces): Deploy and Debug - Part 1

jayendran

Jayendran Arumugam

Posted on August 21, 2020

Azure Synapse Analytics(workspaces): Deploy and Debug - Part 1

Disclaimer:

This post is provided "as-is".

Information and views expressed in this post, including URL and other Internet Web site references, may change without notice. Some examples depicted herein are provided for illustration only and are fictitious. No real association or connection is intended or should be inferred. This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes.

Setting up the Definitions clear

Azure Synapse Analytics

Azure Synapse is an analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources-at scale

Azure Synapse Analytics Workspace (Preview)

Azure Synapse comes with a web-native Studio user experience that provides a single experience and model for management, monitoring, coding, and security called synapse analytics workspace.
(As of writing this post, Azure synapse Analytics workspace is in preview)

If you are familiar with Azure Data Platform, I can simply put synapses workspace in a single pic like below 😉

Alt Text

Roadmap of Azure Synapse Analytics

Alt Text

What we are going to see in this Post?

We can create the synapses workspace which is a public preview from azure portal easily. However currently, there are no official docs yet available that can give detailed steps for creating synapses workspace programmatically using ARM. In this post (part 1), we are going to see how we can deploy azure synapses from the ARM template using service principal, deployment architecture, the different levels of access, and conditions.

I choose this topic because right now most of the docs are still under development, I faced a lot of troubles initially, even I created few issues and PR. So this post it's just a matter of sharing my new knowledge to others 😊

Creating Synapses Workspace through ARM Template using SPN Failing during provisioning "storageRoleDeploymentResource" with BAD Request #60705

Is there any Specific permissions needed to include while deploying a Synapses workspace through ARM with SPN Authentication?

ARM will getting failed at `storageRoleDeploymentResource' Resource provisioning state with BAD request

Also the sql admin was randomly assigned with a GUID. Do we need any special permission to include in the doc to cover the SPN Deployment ?

Doc Link: https://docs.microsoft.com/en-us/azure/synapse-analytics/security/how-to-set-up-access-control


Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Synapses Creating Big Data Pool validation error: Parameter 'BigDataPoolResourceInfo.node_count' failed to meet validation requirement. #14708

az feedback auto-generates most of the information requested below, as of CLI version 2.0.62

Describe the bug Running the command of creating bigdata pools for azure synapses getting the below error validation error: Parameter 'BigDataPoolResourceInfo.node_count' failed to meet validation requirement.

To Reproduce Download latest az cli and running the below command which was the example given at here

az synapse spark pool create --name testpool --workspace-name testsynapseworkspace --resource-group rg \
--spark-version 2.4 --node-count 3 --node-size Medium

Expected behavior

It should create a Big data pool

Environment summary

Windows 10

Additional context

Synapses Data Plane API Audience should remove Trailing #10500

Am not exactly sure that this is a bug in the code itself or docs need to be updated. Either way, I'm creating this issue for both the case

Data Plane APIs Synapses Audience need to remove extra /

As per the below docs

Set the Authorization header to a JSON Web Token that you obtain from Azure Active Directory. For data-plane operations, be sure to obtain a token for the resource URI / audience claim "https://dev.azuresynapse.net/", NOT "https://management.core.windows.net/" nor "https://management.azure.com/". For more information, see Acquire an access token.

From the above statement has the audience claim https://dev.azuresynapse.net/ is not working, and will give an error like

{
    "code": "InvalidTokenAuthenticationAudience",
    "message": "Token Authentication failed with SecurityTokenInvalidAudienceException - IDX10214: Audience validation failed. Audiences: '[PII is hidden]'. Did not match: validationParameters.ValidAudience: '[PII is hidden]' or validationParameters.ValidAudiences: '[PII is hidden]'."
}

But once we removed the trailing slash at the end of the URL (/) which will be https://dev.azuresynapse.net then the APIs are working.

Please make necessary fix either in the docs or the API

Reference: https://docs.microsoft.com/en-us/rest/api/synapse/?source=docs

Fixed the Address/Ip Typo #12622

Fixed the Address/Ip Typo

Description

Checklist

  • [x] I have read the Submitting Changes section of CONTRIBUTING.md
  • [x] The title of the PR is clear and informative
  • [x] The appropriate ChangeLog.md file(s) has been updated:
    • For any service, the ChangeLog.md file can be found at src/{{SERVICE}}/{{SERVICE}}/ChangeLog.md
    • A snippet outlining the change(s) made in the PR should be written under the ## Upcoming Release header -- no new version header should be added
  • [x] The PR does not introduce breaking changes
  • [x] If applicable, the changes made in the PR have proper test coverage
  • [x] For public API changes to cmdlets:
    • [x] a cmdlet design review was approved for the changes in this repository (Microsoft internal only)
    • [x] the markdown help files have been regenerated using the commands listed here

Getting ARM Template from Azure Portal

We can easily grab the template for the synapses workspace template from the Azure portal itself.

Step 1:

Alt Text

Step 2:

Alt Text

After getting the ARM template will look like the below

ARM Parameters,

Architecture of Synapses Workspace:

Here I'm just giving a very simple & high-level architecture image for understanding the synapses workspace components.

Alt Text

As you see synapses workspace itself consist of a storage account gen2 and default on-demand SQL pool, it can be accessed by 3 different roles (NOT RBAC roles) called

  • Workspace Admin
  • SQL Admin
  • Spark Admin

I'll explain these roles in part 2. As of now just assume its a role needed for any user to access the workspace.

ARM Graphical Viewer

We can easily understand this ARM using VS Code + ARM Template View extension. Here the final result will look like

Alt Text

Great! now we understood that synapses workspace needs a storage account(gen2) and a container (gen2filessystem) in it. The ARM also has some other components like roleassignments, managedidentitysqlcontrolsettings, and firewall. which are basically for giving correct permissions for our workspace, we will look more about these in the below sections.

Analyzing Parameters from ARM

Most of the parameters are self-explanatory, however, some of them depend on some high privilege permission. Let's see those

  • setWorkspaceIdentityRbacOnStorageAccount : If true, this will assign the role of the workspace(MSI) as the storage blob contributor to the existing or the new storage account. This needs Microsoft.Authorization/roleAssignments/write permission which requires owner role or at-least User Access Administrator. So make sure you give owner/User Access Administrator access to your SPN if you set this true.

The below table will help you to understand this parameter based on your SPN access.

  • grantWorkspaceIdentityControlForSql :
    Grant CONTROL to the workspace's managed identity on all SQL pools and SQL on-demand

  • isNewFileSystemOnly: If the storage account new/exist but when we need to create a new filesystem, use this variable to true

  • setSbdcRbacOnStorageAccount : If we need to enable the user, (whose object id will be provided in userObjectId) as the Storage Blob contributor to the Storage account gen2.

This is a nested task which depends on setWorkspaceIdentityRbacOnStorageAccount parameter, i.e., this will be executed only if you provide setWorkspaceIdentityRbacOnStorageAccount as true.
E.g If you provide setWorkspaceIdentityRbacOnStorageAccount as false and even if you provide setSbdcRbacOnStorageAccount as true it won't affect anything.

Conclusion

I hope you get some additional information about the synapse workspace from this post. I'll explain more about API's, security best practice in Part 2

💖 💪 🙅 🚩
jayendran
Jayendran Arumugam

Posted on August 21, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related