The ABCs of serverless

danc

Danny Chan

Posted on January 15, 2024

The ABCs of serverless

AWS services for serverless

API Gateway
API management service for hosting REST and WebSocket APIs. Can be used to trigger Lambda functions.

Bucket (S3)
Serverless object storage for files, images and other binary data. Can be used with Lambda.

CloudWatch
Monitoring service for serverless applications, functions and resources.

DynamoDB
Serverless NoSQL database to store and retrieve data without capacity planning.

EventBridge
Fully managed event bus to help build event-driven architectures.

Lambda
Serverless compute service that lets you run code without provisioning or managing servers.

Step Functions
Coordinates multiple Lambda functions in a visual workflow.

Queue (SQS)
Queue service to decouple components and asynchronously process workloads. Can be used with Lambda.

Pub/sub (SNS)
Pub/sub service to deliver messages/notifications to subscribers like Lambda.

Serverless Application Model (SAM)
Simplifies building and deploying serverless applications on AWS.

X-Ray
Service for tracing and debugging serverless microservices and applications.



Summary

  • Asynchronous messaging
  • Backward compatibility
  • Compensation logic
  • Data-driven architecture
  • Eventual consistency
  • Filtering events to appropriate functions using rules
  • Game backend architecture
  • Hub/publish pattern
  • Idempotent replay of events to allow duplicate delivery
  • JSON data structures
  • Keeping events small and avoiding duplication
  • Leveraging event sourcing for audit log of state changes
  • Minimum data needed for event consumers
  • No single point of failure
  • Orchestrator function
  • Parallel processing
  • Queues for publishing events
  • Real-time or batch processing of state changes
  • Stateless and ephemeral
  • Triggering Functions through Step Functions state machines
  • Unlimited scalability based on available cloud provider capacity
  • Validating pre-conditions before emitting new events downstream
  • Worker Lambda functions
  • X-ray
  • You build it, you run it
  • Zero trust



Asynchronous messaging

Image description

Image description

  • Asynchronous processing allows services to proceed without waiting for immediate responses.
  • Asynchronous messaging ensures that failures in one service do not impede the entire application.



Backwards compatibility

Image description

  • Backwards compatibility of event schemas is achieved through event versioning helps to differentiate breaking changes.
  • The EventBridge schema registry is used to define and version event schemas.
  • Producers specify the schema version when emitting events. Consumers target specific versions or the latest when subscribing to events.
  • Producers can emit new versions of events while still supporting deprecated versions during the migration period.
  • This approach enables independent upgrades and ensures fault tolerance.



Compensation logic

Image description

  • Compensation logic is used to define steps that can reverse or compensate for partial work in a transaction in the event of a failure.
  • It is particularly useful for long-running workflows that involve multiple services, as it ensures consistency.
  • When a failure is detected, compensating functions are triggered to reverse any prior changes made.
  • Workflows can be modeled as sequences of events that trigger functions to execute discrete steps.



Data-driven architecture

Image description

  • In a data-driven architecture, events are designed based on business actions and domain concepts, helping to isolate domain and business logic.
  • Rich domain events are utilized to capture business-level changes. These events are Self-describing and contain all the necessary data to notify downstream consumers.
  • Persistent data should be stored in a database like DynamoDB, and accessed through data access functions or layers instead of directly within the domain logic.



Eventual consistency

Image description

Image description

  • Recommended to store data in databases such as DynamoDB.
  • Functions may not have a consistent view of the data due to the distributed nature of serverless architectures.
  • This approach helps ensure that functions operate on the most up-to-date data available.
  • It is advisable to access data through separate data access functions rather than directly in business logic functions.



Filtering events to appropriate functions using rules

Image description

  • EventBridge serves as a central hub for connecting various event sources and targets, allowing events to be filtered and routed to the appropriate functions based on simple rules defined within the system.
  • It operates on a pull-based model, actively retrieving events from the event source.
  • EventBridge is responsible for routing events from event sources like API Gateway, S3, and others to the corresponding Lambda functions that handle those events.



Game backend architecture

Image description

  • Game backend architecture is serverless-based, enabling scalability of throughput according to workload.
  • It supports real-time applications through the use of push notifications, specifically through AppSync.
  • The databases used in this architecture can auto-scale, ensuring efficient management of resources. DynamoDB is a prime example of such a database.
  • This architecture is designed to handle real-time systems, where various services are responsive to events as they happen, whether they originate from other services within the system or external systems.



Hub/pub pattern

Image description

  • Listeners and handlers are implemented, including retry logic within functions, to effectively handle transient errors that may arise from downstream services.
  • The architecture allows for independent auto-scaling of services, ensuring scalability based on their specific needs.
  • Functions within the system subscribe to relevant events and execute discrete tasks, rather than directly modifying the state.
  • The addition of new event consumers does not have an impact on existing services, making it easy to extend the functionality of the system.
  • Any outages or failures in one service do not block the entire application, ensuring resilience and continued operation.



Idempotent replay of events to allow duplicate or out-of-order delivery

Image description

Image description

Image description

  • The architecture incorporates idempotent replay of events, enabling handling of duplicate or out-of-order delivery. This allows for the reconstruction of past application states by replaying the event sequence from the beginning.
  • Retry logic is automatically applied to functions that encounter failures due to transient issues, such as spikes in workload.
  • Services like SQS utilize dead-letter queues (DLQs) to capture messages from functions that permanently fail after exhausting all retries.
  • Instead of directly modifying existing records, the architecture follows the approach of creating new record versions, ensuring data integrity.
  • Event sourcing is implemented to capture all changes as a sequence of events, providing a comprehensive historical record of the system's state.



JSON data structures

Image description

  • These data structures are designed as immutable records, ensuring that data cannot be altered by concurrent processes.
  • The use of immutable records allows for making backwards-compatible changes to the data structure over time, ensuring compatibility with existing systems or processes.



*Keeps events small and avoids duplication *

Image description

  • The architecture emphasizes keeping events small and avoiding duplication.
  • Relationships and references between entities are modeled using identifiers instead of embedding related data.
  • To maintain small, focused, and immutable events, only the necessary data is included in each event.



Leveraging event sourcing for audit log of state changes

Image description

  • Event sourcing is utilized to create an audit log of state changes.
  • AWS Lambda is employed to process events received from EventBridge and store them as an audit log in a DynamoDB table.
  • The audit log table should include essential information such as the event ID, timestamp, state before the change, and state after the change.
  • By reading the audit log table and sequentially applying the state changes, events can be replayed and the system can be brought to a specific state.



Minimum data needed for event consumers

Image description

  • The architecture focuses on providing event consumers with the minimum data necessary.
  • Immutable records are used to maintain consistency for downstream processes.
  • Events are designed to capture only the essential data required to identify the entity and describe the action.



No single point of failure

Image description

Image description

  • Circuit breakers are used in distributed systems to mitigate failures and latency issues.
  • Circuit breakers are utilized to monitor failure thresholds and trigger an open state when there is an excessive number of failing requests. This prevents further calls to dependent functions or services, halting the propagation of failures and avoiding a cascade effect.
  • Timeouts are set on functions to prevent failures caused by infinite loops or long-running processes.
  • Automatic retries and dead-letter queues (DLQs), are implemented to enhance the reliability, scalability, and resilience of serverless applications operating on distributed systems.
  • Monitor circuit breaker health and trigger alerts if error rates increase, so remedial action can be taken before it fails completely.
  • Architecture becomes fault tolerant with no single component whose failure can cause the system to stop working.



Orchestrator function

Image description

Image description

  • Step Functions as a central orchestrator.
  • Sagas are employed to coordinate long-running transactions that involve multiple services.
  • By implementing error handling centrally in the orchestrator function, consistency is maintained instead of having it distributed across multiple functions.
  • Step Functions handle retries, errors, and ensure atomicity of transactions through a series of defined steps.



Parallel processing

Image description

  • Parallel processing is achieved through the utilization of SNS topics.
  • SNS topics allow for the fan-out of events to multiple Lambda functions that are subscribed to the topic.



Queues of publishing events

Image description

  • The architecture employs queues for publishing events, which promotes loose coupling between services, allowing for easy replacement or rewriting of services without impacting others.
  • Application events are stored in queues, and functions pull and process messages from these queues.
  • When a subscriber is temporarily unavailable, messages are stored in queues, specifically leveraging SQS as buffers between functions.
  • The use of queues ensures that services can replace underlying implementations without affecting consumers.
  • In the event of a consumer failure, messages are redelivered from the queue, guaranteeing reliability.
  • Consumers can poll the queue for messages without requiring knowledge of the producers.
  • SQS queues are utilized to decouple event handling from downstream processing.
  • Both Kinesis or SQS can be used to reliably decouple producers and consumers, eliminating dependencies between them.



Real-time of state changes

Image description

  • Kinesis and AWS Lambda are utilized to process streaming data in real-time, ensuring efficient and timely processing of the data.
  • The architecture enables real-time tracking of state changes, supporting applications that utilize push notifications or websockets to handle high volumes of concurrent connections.
  • Event streams are employed to capture and process state changes in real-time, enabling both real-time processing of these changes.
  • The architecture facilitates the development of real-time applications where services can react immediately to events as they occur.
  • It also supports the creation of real-time systems, allowing services to asynchronously react to events from other services or external systems as they happen.



Stateless and ephemeral

Image description

  • The architecture promotes a stateless and ephemeral approach, shifting data management responsibilities to external services such as databases.
  • Event payloads are designed to include all relevant context data, eliminating the need to rely on external state. This enables stateless handling during event replay.
  • Compensation logic is implemented to handle long-running transactions and revert partial work if necessary.
  • The architecture emphasizes the use of pure functions without side effects, ensuring predictable and reliable behavior.



Triggering Functions through state machines

Image description

  • Functions are triggered as tasks within state machines using Step Functions.
  • Step Functions enable the creation of stateful or long-running workflows that progress through different states, invoking Lambda functions accordingly.
  • Step Functions are utilized to orchestrate multiple Lambdas and AWS Fargate, providing a comprehensive workflow management solution.
  • Step Functions serve as a coordination mechanism for workflows across Lambdas, allowing for the seamless execution of multi-step processes.
  • In case of a failure in any step, Step Functions trigger rollback functions to revert prior state changes, ensuring data consistency.
  • Step Functions handle retries, errors, and ensure atomicity of transactions through a series of defined steps.
  • Step Functions offer the capability to define state machines, allowing for the modeling of sagas and processes in serverless applications.
  • Branching and error handling between steps are managed by Step Functions to ensure transactional integrity.



Unlimited scalability

Image description

  • The architecture provides unlimited scalability based on the available capacity of the cloud provider.
  • It excels in handling unpredictable traffic spikes more effectively compared to fixed capacity servers.



Validating before emitting events to downstream

Image description

  • Prior to emitting new events downstream, it is crucial to validate pre-conditions.
  • Validate pre-conditions to ensure downstream systems receive accurate and valid events.
  • Centralize the documentation of event types and schemas for easy discoverability.
  • Automated validation processes aid in detecting errors early on.
  • Compensation logic is necessary for long-running transactions to reverse partial work if validation fails during event replay.
  • If validation checks fail in later steps, reverse the initial database updates performed.
  • Define compensating functions that undo the effects of prior steps if validation fails.
  • Incorporate validation checks or conditional logic before committing state changes.
  • Perform validation checks after state changes to identify errors. If errors are found, employ rollback functions to delete the affected records.



Worker Lambda function

Image description

  • Worker Lambda functions encapsulate the core application or business logic.
  • Lambda functions are used to create modular and isolated units of code, with each function dedicated to handling a single task or type of event.
  • This approach ensures separation of concerns, keeping different functionalities or responsibilities distinct within the architecture.



X-ray

  • X-ray is utilized to implement error handling and monitoring separately for event handling and application code.
  • It allows for the independent monitoring and tracing of event handling processes and application logic.
  • X-ray enables centralized monitoring and management of all events within the architecture, providing a comprehensive view of the system's behavior and performance.



You build it, you run it.

Image description

  • "You build it, you run it" is a concept that emphasizes taking responsibility for operating the software in production that your team builds.
  • It entails that the developers who write the code are also accountable for deploying their code changes to production and monitoring the application once it goes live.
  • This approach encourages developers to prioritize operational excellence right from the beginning of the development process.
  • Furthermore, it fosters collaboration between development and operations teams, promoting a shared sense of responsibility and teamwork.



Zero trust

Image description

  • Zero trust is based on the principle that everything can go wrong.
  • It acknowledges that anyone can make mistakes and that every infrastructure has the potential to fail.
  • It recognizes that events can be unpredictable and that even well-designed logic can contain bugs.
  • Zero trust also considers the possibility that marketplace providers may go out of business.
  • Additionally, it acknowledges that security issues can occur and should be taken seriously.
💖 💪 🙅 🚩
danc
Danny Chan

Posted on January 15, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related