Visualizing Akamai DataStream 2 logs with Elasticsearch and Kibana on Linode
Hideki Okamoto
Posted on September 23, 2022
Setting up Elasticsearch and Kibana on Linode for visualizing Akamai DataStream 2 logs: All steps can be done using only a web browser and do not require login to the Linux console. The installation procedure takes only 10 minutes, excluding DataStream 2 activation time.
Akamai DataStream 2
Akamai DataStream 2 is a free feature that streams access logs from Akamai Intelligent Edge Platform to designated destinations in near real-time. As of November 2023, DataStream 2 can deliver access logs to the following destinations.
- Amazon S3
- Azure Storage
- Custom HTTPS endpoint
- Datadog
- Elasticsearch
- Google Cloud Storage
- Loggly
- New Relic
- Oracle Cloud
- S3-compatible destinations (incl. Linode Object Storage)
- Splunk
- Sumo Logic
See also: DataStream 2 - Stream logs to a destination
Elasticsearch / Kibana
Elasticsearch is a full-text search engine developed by Elastic. Its source code is publicly available under a dual license, the Server Side Public License and the Elastic License. Kibana is data visualization software for Elasticsearch and is offered under the same terms as Elasticsearch. The combination of this two software with data collection pipeline software called Logstash is called ELK Stack, which has evolved from its original use as a full-text search engine and is now popular as a log analysis and data visualization platform.
The goal of this article
I will explain how to use Akamai DataStream 2 to deliver access logs to Elasticsearch running on Linode in near real-time and visualize the logs with Kibana. By following the steps you can create a dashboard like a screenshot below, which graphs a typical field out of the 45 fields included in DataStream 2.
I will deploy the stack to Linode, an IaaS (Infrastructure as a Service) provider acquired by Akamai in February 2022. Linode has a deployment automation feature called StackScripts, which allows you to have Elasticsearch+Kibana ready to receive access logs in about 10 minutes (Apart from configuring Elasticsearch+Kibana, activating DataStream 2 takes about 1.5 hours separately.)
See also: Linode StackScripts
This article explains how to build an Elasticsearch and Kibana environment from scratch, but if you are already running these environments and only need Elasticsearch Index Mapping, Kibana Data View, Visualization, and Dashboard definition files for Akamai DataStream 2, these are also available on GitHub for download.
Elasticsearch, Kibana definition files
Install Elasticsearch and Kibana on Linode
First, open the StackScript I have prepared for you from the following link. This StackScript will automatically install Elasticsearch and Kibana. (You must be logged into your Linode account to access the link.) If you can't open this StackScript for some reason, I have uploaded the contents of this StackScript to GitHub for you.
elasticsearch-kibana-for-akamai-datastream2
https://cloud.linode.com/stackscripts/1059555
Click "Deploy New Linode"
StackScript has a feature called UDF (User Defined Fields) that automatically creates an input form with the parameters required for deployment. This StackScript requires you to set the login credential of a non-root user who can SSH into the virtual machine, passwords for Elasticsearch and Kibana administrative users, and authentication information for DataStream 2 to feed logs to Elasticsearch during the deployment process for the virtual machine. You need to specify these parameters when deploying the machine. Enter the required parameters in the text box below. The values entered here will be used later, so keep a note of them.
Select the region where the virtual machine will be created and the type of virtual machine. Use one that has a minimum of 8 GB of memory, as Elasticsearch and Kibana will fail to launch without it. Here I select Dedicated 8 GB Linode in the Tokyo region of Japan. If you intend to use this setup to visualize the logs of a high-traffic website, you may need to choose an even higher-performance instance type. See the "Considerations for production use" section at the bottom of this article for more information.
Name the virtual machine, enter the root password, and click "Create Linode".
The screen will transition to the virtual machine management dashboard. Wait a few minutes until the virtual machine status changes from PROVISIONING to RUNNING. The IP address of the virtual machine you just created is displayed on the same screen, so take note of it.
Check and note the Reverse DNS value for the virtual machine from the Network tab of the virtual machine, as it will be needed in the DataStream 2 configuration procedure.
The virtual machine is now booted. The installation process of Elasticsearch and Kibana will proceed automatically in the background. Wait 10 minutes for the installation to complete before proceeding to the next step.
Log in to Kibana
Let's make sure you can log in to Kibana by accessing http://[IP address of the virtual machine]:5601/
from your web browser. Enter elastic
as the login user and the password you specified when deploying the virtual machine.
Click the hamburger button in the upper left corner to display the menu and click Analytics -> Dashboard.
A dashboard named "Akamai" was automatically created by the StackScript, so open it.
If the dashboard appears as shown above, you have done the steps correctly so far. At this point, there is no data because DataStream 2 has not yet been set up.
Configure DataStream 2
You need to set up DataStream 2 from the Akamai Control Center as well. Click the hamburger button in the upper left corner to display the menu, then click COMMON SERVICES -> DataStream, and follow the steps below to create a stream.
Name the stream and mark the checkboxes for the delivery properties for which you want to enable log delivery via DataStream 2.
A screen will appear to select the fields in the access log to be sent, so as an example, check Include all for all categories. As of September 2022, a total of 45 fields would be selected. Also, at the bottom of the configuration screen, select JSON as the log format.
Please select the fields of the log to be collected taking into account the laws and regulations regarding the protection of PII.
Next, set the destination for DataStream 2: Elasticsearch
for Destination, any name for Display name, http://[Reverse DNS hostname]:9200/_bulk
for Endpoint using the Reverse DNS hostname that you noted when creating the virtual machine, datastream2
for Index name, and the Username and Password that you entered when deploying the virtual machine. Also, mark the Send compresses data checkbox and click the "Validate & Save" button in the lower right corner of the screen. If all the values are correct, you will see the message "Destination details are valid" in the lower right corner and the screen will change.
Finally, a summary of the settings will be displayed to confirm that the settings are correct. Check the "Activate stream upon saving" checkbox. It takes about 1.5 hours for DataStream 2 to begin log streaming. If you would like to receive an email notification when logs are started to be delivered, check "Receive an email once activation is complete." and enter your email address. When the activation process of DataStream 2 starts, a message will be displayed indicating that DataStream 2 settings are required also on the Property Manager. Click "Proceed to Property Manager" after noting the name of DataStream 2 that you have just set.
Enable DataStream 2 in Akamai delivery properties
This section assumes that you understand the basic operations of the Akamai Property Manager.
Delivering logs through DataStream 2 requires adding "DataStream 2" behavior on the property. After completing the DataStream 2 setup steps as described above, create a new version of the property, add the two behaviors "DataStream" and "Log Request Details" to the default rules, and configure the behaviors referring to the example configuration below. This can be done in parallel while waiting for DataStream 2 to be activated.
Stream version | DataStream 2 |
---|---|
Stream names | Name specified during DataSteam 2 setup steps |
Sampling rate | Percentage of logs to be sent (100 means all logs) |
Log *** Header | Whether to log the corresponding header in the request |
Cookie Mode | Whether to log cookies in the request |
Include Custom Log Field | Whether to log the custom log field |
Custom Log Field | Value to populate the custom log field (I include TLS Cipher Suite used as an example, but it can be left blank) |
Log Akamai Edge Server IP Address | Whether to log the IP address of the edge server that processed the request (This option must be On) |
Please select the fields of the log to be collected taking into account the laws and regulations regarding the protection of PII.
Once the configuration is finished, save and activate the property. Access logs will begin to appear in Kibana after both DataStream 2 and the property are activated.
Congratulations! Now you can see access logs from Akamai in near real-time!
Advanced Usage
Conditional access log delivery
Streaming all access logs to the log analysis infrastructure through DataStream 2, and then filtering the logs according to necessity on the log analysis infrastructure side is the typical usage. It is also possible to use DataStream 2 to feed only logs that meet certain conditions. This is useful especially when the amount of logs is huge and you want to reduce the load on the log analysis infrastructure, or when you are not interested in the logs of requests successfully done and only want to see the logs of errors. As an implementation example, if "DataStream" behavior is removed from the default rule of the property and enabling "DataStream" under the following conditions, logs would be sent only when the request path is under /foo/bar/
and the response code is neither 200
206
304
.
Using the custom field
A field called custom field exists in the access log sent from DataStream 2. You can set any string up to 1000 bytes here, so you can include built-in variables of the edge server or property user variables.
See also: Built-in variables
In the example below, the transfer time taken from the edge server to return a response to the client is set in the custom field.
This value can be parsed as a Runtime field in Kibana's Data View to make it a new field.
String ctt=dissect('ctt:%{ctt_val}').extract(doc["customField"].value)?.ctt_val;
if (ctt != null) emit(Integer.parseInt(ctt));
Akamai's edge computing platform, EdgeWorkers, allows you to set user variables from its JavaScript code. Custom fields can be set to contain debugging information for EdgeWorkers applications or values of interest to the business logic implemented in EdgeWorkers, which can then be aggregated in the logging analysis infrastructure. Note that as of June 2023, the custom field is limited to 1000 bytes in length.
See also: Request Object setVariable()
Debug EdgeWorkers
Although EdgeWorkers terminates the script when various limitations (CPU time, memory usage, execution time, etc.) are exceeded, you may encounter situations where it only works correctly under certain conditions that depend on the content of the request. Error statistics can be viewed from the Akamai Control Center, but the details of each request at the time of the error are not available. With DataStream 2, not only the request information when an error occurs, but also the detailed operation status of the EdgeWorkers runtime is available in the fields named ewExecutionInfo
and ewUsageInfo
, which can be useful for troubleshooting.
See also: DataStream 2 log details
Monitoring of Web Application Firewall activities
Since Akamai's Web Application Firewall (WAF) comes with an advanced log analysis feature called Web Security Analytics (WSA), most cyber attack analyses can be completed with WSA. On the other hand, DataStream 2 also includes a summary of the WAF's detection results in the log fields, so you can take advantage of fields not seen by WSA for supplementary analysis, or take advantage of advanced features such as machine learning that the analysis tool has. The following screenshot shows how the DataStream 2 data shows that the WAF has detected a directory traversal attack.
Visualize Common Media Client Data (CMCD)
Common Media Client Data (CMCD) is a standardized data format for sending various metrics collected by video players to servers, such as CDNs. The CTA WAVE project published the CMCD specification in September 2020.
Web Application Video Ecosystem - Common Media Client Data
Video players that support CMCD send various information to the server as HTTP request headers or query parameters. You can add CMCD into DataStream 2 logs of Akamai Adaptive Media Delivery (AMD), so you can visualize video playback quality and information with Elasticsearch + Kibana built in this article. This allows you to correlate CDN logs and quality metrics in addition to existing video QoS & QoE measurement tools. For more information on the benefits of using CMCD, please refer to the following article.
Get More from Your Player Analytics and CDN Logs with CMCD
Most video players send CMCD as query parameters, but if you want to send CMCD data via request headers, you need to add CORS header settings in AMD. Please refer to the following documentation to change the CORS headers.
On April 19, 2023, I have updated the StackScript referenced in this article. Elasticsearch + Kibana installations using the StackScript after this date will create index templates, data views, and dashboards that support CMCD.
Considerations for production use
For the sake of simplicity, this article does not cover some considerations that should be taken into account for production use. For example, the following points at the least should be considered:
- Configure Datastream - Upload Failures alert
- Use a higher-performance instance type to accommodate the volume of logs
- Enable HTTPS on Elasticsearch API endpoints
- Make Elasticsearch nodes redundant
- Allocate additional storage for virtual machines using Linode Block Storage
- Design the lifecycle management of access logs
The instance type you need to use depends on the amount of logs from DataStream 2. Please refer to Benchmarking and sizing your Elasticsearch cluster for logs and metrics as a good starting point to select a proper instance type.
This article disables HTTPS for Elasticsearch to skip the SSL certificate issuance procedure. For production use, it is recommended to get an SSL server certificate issued by a certificate authority and enable HTTPS for communication between DataStream 2 and Elasticsearch. DataStream 2 does not accept self-signed certificates. You can find related information by searching with keywords such as "Elasticsearch Let's Encrypt".
Since StackScript rewrites the Elasticsearch configuration file to disable HTTPS, to make HTTPS enabled again, you need to change xpack.security.http.ssl
to true
in /etc/elasticsearch/elasticsearch.yml
after deploying the SSL server certificate. Then change the destination endpoint of DataStream 2 from http://
to https://
.
Appendix
Posted on September 23, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
September 23, 2022