Access Control to AWS OpenSearch

hassanelferga

Hassan Ibrahim

Posted on August 21, 2022

Access Control to AWS OpenSearch

On this blog post, we are going to show how you can control access to documents on AWS OpenSearch Services. The solution we designed and implemented base on the specific use cases, but the solution could be tweaked to be used in other use cases.
The technologies we will use in this solution are:

Solution Overview

The were two Elasticsearch domains mainly to store the applications' logs. One domain is dedicated to the production logs and the other domain is dedicated to other environments logs such as development and testing. The solution was lacking authentication and authorization, who has VPN access can access Kibana and search any document stored in the Elasticsearch domain. The main drive to upgrade to AWS OpenSearch is control access to data stored in the domain based on RBAC model and also have the new nice feature in OpenSearch such as Warm and Cold Storage which eventually reduce cost.
The requirements for this solution are:

  1. Authenticate users using Azure AAD.
  2. Allow teams to access their data that belongs to their applications or infrastructure.
  3. Have one OpenSearch domain for all environments log data.

In this post will not explain how to flow data from your applications or infrastructure to OpenSearch, this will be another blog post series.

Before we go into details, I recommend to go through fine-grained access control

Authentication Implementation

Now we have three requirements, we will start the implementation one by one. This requirement started by collaborating with Azure AD team to design the authentication process.

  • Each team will be assigned two Azure AD groups Dev and Lead.
  • Azure AD team creates an application on AAD.
  • OpenSearch configures the authentication using SAML 2.0 with AAD.
  • User will access OpenSearch Dashboards (previously Kibana) from myapplications.microsoft.com.
  • The Authentication process will start by checking the user permission to access the AAD application or not (Assigned by AAD team), then the user is authenticated against AAD and if successful return SAML ticket is returned to OpenSearch.

Notice: Before configuring SAML with OpenSearch, login using your admin user and map your Azure AAD group to the predefined role all_access otherwise you will not be able to perform most of actions.

In this process you are only required to configure OpenSearch with SAML authentication, the following blog post show you can configure SAML for OpenSearch here

Authorization Implementation

First, we need to explain some concept about RBAC on OpenSearch. Permissions are managed on OpenSearch using roles, users and mappings.

  • User: is an entity that represent a user who accessing OpenSearch, the user can be internal that exists on and managed by OpenSearch or external, as in our case the user externally exists on AAD and managed by external Identity Provider.
  • Role: is an entity that hold the permissions the user can performs. The permission are cluster-level, index-level, document-level and field-level. The OpenSearch comes with pre-defined roles which we will use.
  • Mapping: is an entity that map the users to the roles. The has two model. Map roles to backend roles e.g. Azure AD groups, IAM role or map to user. Back to the second requirement, Allow teams to access their data that belongs to their applications or infrastructure. To achieve that we made the following decisions:
  1. Each team will have an OpenSearch Role. The role will have the required permissions to allow the team query data only. No permission to manipulate the cluster or the indices. Since the applications and infrastructures operate in AWS and each team has a set of of AWS accounts we will make Document Level Security on the role based on AccountId.
  2. Each team will have a dedicated Tenant.. This tenant will allow the team shares their work.

To keep things simple, we will do the above as manual steps first, then we can automate the process using AWS CloudFormation and Lambda.
CREATE ROLE

  1. On OpenSearch, Choose Security --> Roles --> Create Role.
  2. Provide a name for the role e.g. Team_A.
  3. Provide cluster permissions. Our use cases was the following permissions
'kibana_all_write',
'cluster_composite_ops_ro',
'indices:data/write/bulk*',
'cluster:admin/opendistro/reports/definition/list',
'cluster:admin/opendistro/reports/definition/list',
'cluster:admin/opendistro/reports/instance/list',
'cluster:admin/opendistro/reports/instance/get',
'cluster:admin/opendistro/reports/definition/create',
'cluster:admin/opendistro/reports/definition/update',           'cluster:admin/opendistro/reports/definition/on_demand',
'cluster:admin/opendistro/reports/definition/delete',
'cluster:admin/opendistro/reports/definition/get',
'cluster:admin/opendistro/reports/menu/download',
'cluster:admin/opendistro/alerting/alerts/get',
'cluster:admin/opendistro/alerting/alerts/ack',
'cluster:admin/opendistro/alerting/monitor/write',
'cluster:admin/opendistro/alerting/monitor/delete',
'cluster:admin/opendistro/alerting/monitor/execute',
'cluster:admin/opendistro/alerting/monitor/get',
'cluster:admin/opendistro/alerting/monitor/search',
'cluster:admin/opendistro/alerting/destination/get',
'cluster:admin/opendistro/alerting/destination/write',
'cluster:admin/opendistro/alerting/destination/delete',           'cluster:admin/opendistro/alerting/destination/email_account/delete',      'cluster:admin/opendistro/alerting/destination/email_account/get',
'cluster:admin/opendistro/ad/detector/search',
'cluster:admin/opendistro/ad/detector/delete',
'cluster:admin/opendistro/ad/detector/info',
'cluster:admin/opendistro/ad/detector/jobmanagement',
'cluster:admin/opendistro/ad/detector/preview',
'cluster:admin/opendistro/ad/detector/run',
'cluster:admin/opendistro/ad/detector/stats',
'cluster:admin/opendistro/ad/detector/write',
'cluster:admin/opendistro/ad/result/search'
Enter fullscreen mode Exit fullscreen mode
  1. Under Index Permission, provide the index patterns e.g. logs-*
  2. Under Allowed Actions, provide read.
  3. Under Document-Level Security, provide your query e.g.
{
   "terms":{
      "accountId":[
         123,
         456
      ]
   }
}
Enter fullscreen mode Exit fullscreen mode

CREATE Tenant

  1. Open OpenSearch Dashboards.
  2. Choose Security, Tenants, and Create tenant.
  3. Give the tenant a name and description.
  4. Choose Create.

Role Mapping
After creating a tenant, give a role access to it using OpenSearch Dashboards:

  • Read-write (kibana_all_write) permissions let the role view and modify objects in the tenant.
  • Read-only (kibana_all_read) permissions let the role view objects, but not modify them.
  1. Open OpenSearch Dashboards.
  2. Choose Security, Roles, and a role.
  3. For Tenant permissions, add tenants, press Enter, and give the role read and/or write permissions to it.
  4. Choose the Mapped users tab and Manage mapping.
  5. Specify users or external identities (also known as backend roles). Here it will by our Azure AAD groups
  6. Choose Map.

So far so good, but manual process is error born, in case of changes, you will need to go through all roles and apply the changes. For enterprise it will be hard to manage, so let's automate this process.
We will have one AWS Lambda function, this lambda will be responsible to make REST API calls to OpenSearch, and an AWS CloudFormation stack that responsible to create or update the mentioned lambda.
Another CloudFormation that will use the lambda function as custom resource, this stack will take the input values form the user to create team role, tenant and do the required mapping.
Let's start with lambda definition, we will use python in this lambda.

To enable lambda to be called from another CloudFormation stack, we use CfnResource from crhelper package. The following all import statement

import boto3
import requests
from requests_aws4auth import AWS4Auth
import json
from munch import DefaultMunch
import kibanarole
import os
from crhelper import CfnResource
Enter fullscreen mode Exit fullscreen mode

We will get some environment variables from CloudFormation stack that will create or update the AWS Lambda

esHost = os.getenv("OpenSearchEndpoint")
# Your OpenSearch URL including https:// and trailing /
region = os.getenv("AwsRegion") # e.g. us-west-1
Enter fullscreen mode Exit fullscreen mode

Some global variables like the REST API endpoints for OpenSearch

GET_ROLES_PATH = '_opendistro/_security/api/roles'
GET_TENANTS_PATH = '_opendistro/_security/api/tenants'
CREATE_ROLE_PATH = '_opendistro/_security/api/roles/__ROLE_NAME__'
CREAT_TENANT_PATH = '_opendistro/_security/api/tenants/__TENANT_NAME__'
CREATE_ROLE_MAPPINGS_PATH ='_opendistro/_security/api/rolesmapping/__ROLE_NAME__'
DELETE_TENANT_PATH = '_opendistro/_security/api/tenants/__TENANT_NAME__'
DELETE_ROLE_PATH = '_opendistro/_security/api/roles/__ROLE_NAME__'
GET_ACTION_GROUP_PATH = '_plugins/_security/api/actiongroups/' + SQUAD_ACTION_GROUP_NAME
CREATE_ACTION_GROUP_PATH = '_plugins/_security/api/actiongroups/' + SQUAD_ACTION_GROUP_NAME
PATCH_ACTION_GROUP_PATH = '_plugins/_security/api/actiongroups/' + SQUAD_ACTION_GROUP_NAME
DELETE_ACTION_GROUP_PATH = '_plugins/_security/api/actiongroups/' + SQUAD_ACTION_GROUP_NAME
CREATE_INDEX_PATTERN_PATH = 'saved_objects/index-pattern/__PATTERN_NAME__'
Enter fullscreen mode Exit fullscreen mode

The AWS authentication using boto3

credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
headers = {"Content-Type": "application/json"}
Enter fullscreen mode Exit fullscreen mode

crhelper initialization and marking the main python method to be called from the CloudFormation

helper = CfnResource()

@helper.create
@helper.update
def apply_kibana_security(event, _):
   # Logic Here
Enter fullscreen mode Exit fullscreen mode

Lambda main method

def handler(event, context):
    # Call method which marked with @helper.create or @helper.update
    helper(event, context)
Enter fullscreen mode Exit fullscreen mode

Now let's go with apply_kibana_security method implementation, first we will get some of the passed parameters form CloudFormation using event['ResourceProperties']['PARMA_NAME']

@helper.create
@helper.update
def apply_kibana_security(event, _):
    # Param to decide create or update an existing team
    create_or_update = event['ResourceProperties']['CreateOrUpdate']
    team_name = event['ResourceProperties']['TeamName']
    # List of Azure AAD groups or any Backend roles
    backend_roles = event['ResourceProperties']['BackendRoles']
    # List of accountIds which will be used in Document-Level Security. Could be any field in your documents.
    account_ids = event['ResourceProperties']['AccountIds']
    create_or_update_action_group()
    if create_or_update == "Create":
        print("Creating team ...")
        response_message = create_squad_in_kibana(team_name, account_ids, backend_roles)
        print("response_message: " + response_message)
        # Return response message to CloudFormation
        helper.Data['Results'] = response_message
    elif create_or_update == "Update":
        response_message = update_team_in_kibana(team_name, account_ids, backend_roles)
        helper.Data['Results'] = response_message
    else:
        helper.Data['Results'] = "Not supported option"

def create_team_in_kibana(team_name, account_ids, backend_roles):
    tenant_name = team_name + "_tenant"
    tenant_description = "This tenant is assigned to " + squad_name + " squad."
    role_name = team_name + "_role"
    # step one: get list of tentat
    tenant_list = get_kibana_tenants()
    # check that the tenant in not exists 
    if tenant_name in tenant_list:
        return "Tenant " + tenant_name + " already exists."
    # step two: get list of roles
    role_list = get_kibana_roles() 
    # check that the role is not exist
    if role_name in role_list:
        return "Role " + role_name + " already exists."
    # step three: creat the tenant
    response = create_tenant(tenant_name, tenant_description)
    print(response.text)

    if response.status_code == 201:
        # create tenant default index patterns
        # create_index_patterns(tenant_name)
        # step four: create the role
        response = create_kibana_role(role_name, tenant_name, is_cp_amin, account_ids)
        print(response.text)
        if response.status_code == 201:
            # step five: creat the role mapping
            response = create_kibana_role_mappings(role_name, backend_roles)
            print(response.text)
            if response.status_code == 201:
                return "Team has been created successfully."
            else:
                # role back for role created by deleting role
                response = delete_role(role_name)
                print(response.text)
                if response.status_code == 200:
                    print("Role created rolled back successfully")
                    return "Can't create role mappings for team" + team_name
                else:
                    return "An error occured, can't role back"
        else:
            # role back for tenant created by deleting the tenant
            print("Rolling back tenant creation")
            response = delete_tenant(tenant_name)
            print(response.text)
            if response.status_code == 200:
                print("Tenant created rolled back successfully")
                return "Can't create role for squad " + team_name
            else:
                return "An error occured, can't role back"
    else:
        return "Can't create tenant for squad " + team_name
Enter fullscreen mode Exit fullscreen mode

The following are example of single operations against OpenSearch
Get List of Roles

def get_kibana_roles():
    url = esHost + GET_ROLES_PATH
    response = requests.get(url, auth=awsauth, headers=headers)
    print(response)
    if response.status_code == 200:
        roles = json.loads(response.text)
        roles_list = []
        for key, value in roles.items():
            role_data = roles[key]
            role_object = DefaultMunch.fromDict(role_data)
            if role_object.reserved == False:
                roles_list.append(key)
        return roles_list
Enter fullscreen mode Exit fullscreen mode

Create a Role and map it to the tenant in one step

def create_kibana_role(role_name, tenant_name, is_cp_admin, account_ids):
    role_json_template = kibanarole.create_kibana_role_template(tenant_name, account_ids, SQUAD_ACTION_GROUP_NAME)
    url = esHost + CREATE_ROLE_PATH.replace("__ROLE_NAME__", role_name)
    response = requests.put(url, auth=awsauth, data=role_json_template, headers=headers)
    print(response)
    return response
def create_kibana_role_template(tenant_name, account_ids, action_group_name):
    accounts = account_ids.split(",")
    account_json = ""
    for account in accounts:
        account_json +=  "\"" + account + "\"," 
    dls = SQUAD_DLS_TEMPLATE.replace("__ACCOUNT_IDS__", account_json.rstrip(','))
    KIBANA_ROLE_JSON_TEMPLATE = {
        'cluster_permissions': [
            action_group_name
        ],
        'index_permissions': [{
            'index_patterns': [
                INDEX_PERMISSION_PATTERN
            ],
        'dls': dls,
        'allowed_actions':[
            SQUAD_ALLOWED_ACTIONS
        ]
        }],
        'tenant_permissions': [{
            'tenant_patterns': [
                tenant_name
            ],
            'allowed_actions': [
                'kibana_all_write'
            ]
        }]
    }
Enter fullscreen mode Exit fullscreen mode

Create Tenant

def create_tenant(tenant_name, tenant_description):
    tenant_json_template = kibanarole.create_tenant_template(tenant_description)
    # print(tenant_json_template)
    url = esHost + CREAT_TENANT_PATH.replace("__TENANT_NAME__", tenant_name)
    # print(url)
    response = requests.put(url, auth=awsauth, json=tenant_json_template, headers=headers)
    print(response.text)
    return response
def create_tenant_template(tenant_description):
    tenant_template = {
        'description': tenant_description
    }
    return tenant_template
Enter fullscreen mode Exit fullscreen mode

Create Role Mapping

def create_kibana_role_mappings(role_name, backend_roles):
    url = esHost + CREATE_ROLE_MAPPINGS_PATH.replace("__ROLE_NAME__", role_name)
    # print(url)
    role_mapping_json_template = kibanarole.create_role_mappings_template(backend_roles)
    print(json.dumps(role_mapping_json_template))
    response = requests.put(url, auth=awsauth, data=json.dumps(role_mapping_json_template), headers=headers)
    print(response)
    return response
def create_role_mappings_template(backend_roles):
    print(backend_roles)
    if len(backend_roles) > 0:
        # json_str = json.loads(backend_roles)
        role_mapping_template = {
            'backend_roles': backend_roles
        }
    else:
        role_mapping_template = {
            'backend_roles': ''
        }
    return role_mapping_template
Enter fullscreen mode Exit fullscreen mode

Well, we have made the Lambda logic, let's create the CloudFormation that will create the lambda. I named it CreateLambda.yml

AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Description: 'SAM template to create an AWS Lambda which will be invoked by CloudFromation to apply OpenSearch Dashboard Security.'

Parameters:
  OpenSearchDomainArn:
    Description: "The Arn for the OpenSearch Domain which the Lambda will call APIs."
    Type: String
    Default: ""

  OpenSearchDomainEndpoint:
    Description: "The OpenSearch Domain endpoint to apply security on."
    Type: String
    Default: ""

  AWSRegion:
     Description: "The AWS region where the Elsatic Search Domain resides."
     Type: String
     Default: "us-west-1"
     AllowedValues:
      - "us-west-1"

  VpcSubnetIds:
    Description: "The Subnet Ids where the OpenSearch domain resides."
    Type: CommaDelimitedList
    Default: ""

  SecurityGroupIds:
    Description: "The Security group Ids applied to the Lambda function."
    Type: CommaDelimitedList
    Default: ""

Resources:
  KibanaSecurityLambda:
    Type: AWS::Serverless::Function
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - lambda.amazonaws.com
            Action:
              - 'sts:AssumeRole'
      CodeUri: lambda/
      Handler: kibana-security.handler
      Timeout: 300
      Runtime: python3.7
      Environment:
        Variables:
          OpenSearchEndpoint: !Ref OpenSearchDomainEndpoint
          AwsRegion: !Ref AWSRegion
      VpcConfig:
        SecurityGroupIds: !Ref SecurityGroupIds
        SubnetIds: !Ref VpcSubnetIds
      Policies:
        - AWSLambdaVPCAccessExecutionRole # Allow Lambda to Create VPC ENI to communicate with OpenSearch Domain
        - Statement:
          - Sid: ElsaticDomainAccess
            Effect: Allow
            Action:
            - es:* # es:ESHTTP*
            Resource:
              !Join
              - ''
              - - !Ref ElasticDomainArn
                - '/*'

Outputs:
  KibanaSecurityLambdaArn:
    Description: "The Lambda Function Arn."
    Value: !GetAtt KibanaSecurityLambda.Arn
Enter fullscreen mode Exit fullscreen mode

This is very important step, to allow the AWS Lambda makes calls to OpenSearch, we need to map the Lambda's Role ARN as backend role. The easiest way to map the Role ARN to the all_access role in OpenSearch, but that is not recommended as you granting the Lambda all access to the cluster, index and documents operations. The recommended way is to create a custom role in OpenSearch and add the least operation the lambda needs to acheive your use cases.

Now the second CloudFormation which will take user inputs and call the Lambda. I called it apply-kibana-security.yml

AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Description: 'SAM template to to apply OpenSearch Dashboard Security. Call custom resource Lambda function.'

Metadata:
  AWS::CloudFormation::Interface: 
    ParameterGroups: 
      - 
        Label: 
          default: "Basic Configurations"
        Parameters: 
          - KibanaSecurityLambdaArn
          - CreateOrUpdate
      - 
        Label: 
          default: "Squad Configurations"
        Parameters: 
          - SquadName
          - AccountIds
          - BackendRoles

Parameters:
  SquadName:
    Description: "The team name, separate names with underscores e.g team_a."
    Type: String

  AccountIds:
    Description: "List of AWS accounts that belong to team, Comma separated e.g. 123,456"
    Type: String
    Default: ""

  BackendRoles:
    Description: "List of Azure AD groups, Comma separated e.g. team_a_dev,team_a_lead"
    Type: String
    Default: ""

  CreateOrUpdate:
    Description: "Specify the option to create new team security in OpenSearvh or update existing team."
    Type: String
    Default: Create
    AllowedValues:
      - "Create"
      - "Update"

  KibanaSecurityLambdaArn:
    Description: "The AWS Lambda which will imported as Custom resource and invoked from this CF."
    Type: String
    Default: ""

Resources:
  ApplyKibanaSecurity:
    Type: "Custom::SecurityApplier"
    Properties:
      ServiceToken: !Ref KibanaSecurityLambdaArn
      CreateOrUpdate: !Ref CreateOrUpdate
      SquadName: !Ref TeamName
      BackendRoles: !Ref BackendRoles
      AccountIds: !Ref AccountIds

Outputs:
  ResponseMessage:
    Description: "The final out response message from Lambda."
    Value: !GetAtt ApplyKibanaSecurity.Results
Enter fullscreen mode Exit fullscreen mode

Alright, now everything is in place, go ahead and test the solution and let's move to the third requirement.

From two domains to one OpenSearch domain - One domain to rule them all :D

This requirement is a little bit tricky, Production log data has different retention period, filtering data now depends on specific fields for example you want to look for data coming from dev environment only. We have two approaches:

  • Make the data in one index for all environment, user uses the envirnoment field to filter data, but we can't have different retention period.
  • Make the production data in separated indecis from other environment. Create index patterns for each envirnoment.

We picked option two, but that required us to change in Logstash and other processor that flow data to OpenSearch. We will see in flow data to OpenSearch post.
We also used the nice feature of OpenSearch to keep the data in hot for x days, hot means you can write and read from the index, then we move the data to warm state, which you can read from index only. This option create some problems with Logstash, but we resolved them using some logic and mutation filters in Logstash.
This part is a big topic and we will have another post for it.
That's is hope you find this useful.

💖 💪 🙅 🚩
hassanelferga
Hassan Ibrahim

Posted on August 21, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Access Control to AWS OpenSearch
opensearch Access Control to AWS OpenSearch

August 21, 2022