Chaos faults for AWS | Harness Developer Hub

NLB AZ down

NLB AZ down takes down the access for AZ (Availability Zones) on a target network load balancer for a specific duration. This fault:

Restricts access to certain availability zones for a specific duration.
Tests the application's ability to handle the loss of availability zones and maintain uninterrupted traffic flow.

Use cases

View details

ECS Fargate Memory Hog

Back to top

ECS Fargate Memory Hog generates high memory consumption on a specific container in an ECS Fargate task. This fault:

Simulates a scenario where a task container consumes excessive memory, causing memory pressure and potential out-of-memory errors.
Tests the slowness and allocation capabilities of the ECS Fargate cluster.

Use cases

View details

Resource Access Restrict

Back to top

Resource Access Restrict restricts access to a specific AWS resource for a specific duration. This fault:

Tests the application's resiliency and error handling when access to a critical AWS resource is restricted.
Validates the application's ability to handle and recover from temporary resource unavailability.

Use cases

View details

ECS Container Volume Detach

Back to top

ECS Container Volume Detach detaches a volume from a specific container running in an ECS task. This fault:

Simulates the detachment of a volume from a task container to test the application's resilience and data management capabilities.
Validates the application's ability to handle volume detachment scenarios and recover gracefully.

Use cases

View details

ECS Fargate CPU Hog

Back to top

ECS Fargate CPU Hog generates high CPU load on a specific task running in an ECS service. This fault:

Simulates a scenario where a task consumes excessive CPU resources, impacting the performance of other main container in the task.
Tests the slowness and resource allocation capabilities of the ECS Fargate task.

Use cases

View details

ECS Fargate Memory Hog

Back to top

ECS Fargate Memory Hog generates high CPU load on a specific task running in an ECS service. This fault:

Simulates a scenario where a task consumes excessive CPU resources, impacting the performance of other main container in the task.
Tests the slowness and resource allocation capabilities of the ECS Fargate task.

Use cases

View details

ALB AZ down

Back to top

ALB AZ down takes down the AZ (Availability Zones) on a target application load balancer for a specific duration. This fault:

Restricts access to certain availability zones for a specific duration.
Tests the application sanity, availability, and recovery workflows of the application pod attached to the load balancer.

Use cases

View details

CLB AZ down

Back to top

CLB AZ down takes down the AZ (Availability Zones) on a target CLB for a specific duration. This fault:

Restricts access to certain availability zones for a specific duration.
Tests the application sanity, availability, and recovery workflows of the application pod attached to the load balancer.

Use cases

View details

EBS loss by ID

Back to top

EBS loss by ID disrupts the state of EBS volume by detaching it from the node (or EC2) instance using volume ID for a certain duration.

In case of EBS persistent volumes, the volumes can self-attach and the re-attachment step can be skipped.
It tests the deployment sanity (replica availability and uninterrupted service) and recovery workflows of the application pod.

Use cases

View details

EBS loss by tag

Back to top

EBS loss by tag disrupts the state of EBS volume by detaching it from the node (or EC2) instance using volume ID for a certain duration.

In case of EBS persistent volumes, the volumes can self-attach and the re-attachment step can be skipped.
It tests the deployment sanity (replica availability and uninterrupted service) and recovery workflows of the application pod.

Use cases

View details

EC2 CPU hog

Back to top

EC2 CPU hog disrupts the state of infrastructure resources. It induces stress on the AWS ECS container using Amazon SSM Run command, which is carried out using SSM docs which is in-built into the fault.

It causes CPU chaos on the containers of the ECS task using the given CLUSTER_NAME environment variable for a specific duration.

Use cases

View details

EC2 DNS chaos

Back to top

EC2 DNS chaos causes DNS errors on the specified EC2 instance for a specific duration.

It determines the performance of the application (or process) running on the EC2 instance(s).

Use cases

View details

EC2 HTTP latency

Back to top

EC2 HTTP latency disrupts the state of infrastructure resources. This fault induces HTTP chaos on an AWS EC2 instance using the Amazon SSM Run command, carried out using SSM Docs that is in-built in the fault.

It injects HTTP response latency to the service whose port is specified using TARGET_SERVICE_PORT environment variable by starting the proxy server and redirecting the traffic through the proxy server.
It introduces HTTP latency chaos on the EC2 instance using an SSM doc for a certain chaos duration.

Use cases

View details

EC2 HTTP modify body

Back to top

EC2 HTTP modify body injects HTTP chaos which affects the request/response by modifying the status code or the body or the headers by starting proxy server and redirecting the traffic through the proxy server.

It tests the application's resilience to erroneous (or incorrect) HTTP response body.

Use cases

View details

EC2 HTTP modify header

Back to top

EC2 HTTP modify header injects HTTP chaos which affects the request (or response) by modifying the status code (or the body or the headers) by starting the proxy server and redirecting the traffic through the proxy server.

It modifies the headers of requests and responses of the service.
This can be used to test the resilience of the application to incorrect (or incomplete) headers.

Use cases

View details

EC2 HTTP reset peer

Back to top

EC2 HTTP reset peer injects HTTP reset on the service whose port is specified using the TARGET_SERVICE_PORT environment variable.

It stops the outgoing HTTP requests by resetting the TCP connection for the requests.
It determines the application's resilience to a lossy (or flaky) HTTP connection.

Use cases

View details

EC2 HTTP status code

Back to top

EC2 HTTP status code injects HTTP chaos that affects the request (or response) by modifying the status code (or the body or the headers) by starting a proxy server and redirecting the traffic through the proxy server.

It tests the application's resilience to erroneous code HTTP responses from the application server.

Use cases

View details

EC2 IO stress

Back to top

EC2 IO stress disrupts the state of infrastructure resources.

The fault induces stress on AWS EC2 instance using Amazon SSM Run command that is carried out using the SSM docs that comes in-built in the fault.
It causes IO stress on the EC2 instance for a certain duration.

Use cases

View details

EC2 memory hog

Back to top

EC2 memory hog disrupts the state of infrastructure resources.

The fault induces stress on AWS EC2 instance using Amazon SSM Run command that is carried out using the SSM docs that comes in-built in the fault.
It causes memory exhaustion on the EC2 instance for a specific duration.

Use cases

View details

EC2 network latency

Back to top

EC2 network latency causes flaky access to the application (or services) by injecting network packet latency to EC2 instance(s).

It determines the performance of the application (or process) running on the EC2 instances.

Use cases

View details

EC2 network loss

Back to top

EC2 network loss causes flaky access to the application (or services) by injecting network packet loss to EC2 instance(s).

It checks the performance of the application (or process) running on the EC2 instances.

Use cases

View details

EC2 process kill

Back to top

EC2 process kill fault kills the target processes running on an EC2 instance.

It checks the performance of the application/process running on the EC2 instance(s).

Use cases

View details

EC2 stop by ID

Back to top

EC2 stop by ID stops an EC2 instance using the provided instance ID or list of instance IDs.

It brings back the instance after a specific duration.
It checks the performance of the application (or process) running on the EC2 instance.
When the MANAGED_NODEGROUP environment variable is enabled, the fault will not try to start the instance after chaos. Instead, it checks for the addition of a new node instance to the cluster.

Use cases

View details

EC2 stop by tag

Back to top

EC2 stop by tag stops an EC2 instance using the provided tag.

It brings back the instance after a specific duration.
It checks the performance of the application (or process) running on the EC2 instance.
When the MANAGED_NODEGROUP environment variable is enabled, the fault will not try to start the instance after chaos. Instead, it checks for the addition of a new node instance to the cluster.

Use cases

View details

ECS agent stop

Back to top

ECS agent stop disrupts the state of infrastructure resources.

The fault induces an agent stop chaos on AWS ECS using Amazon SSM Run command, this is carried out by using SSM Docs which is in-built in the fault for the give chaos scenario.
It causes agent container stop on ECS with a given CLUSTER_NAME envrionment variable using an SSM docs for a specific duration.

Use cases

View details

ECS container CPU hog

Back to top

ECS container CPU hog disrupts the state of infrastructure resources. It induces stress on the AWS ECS container using Amazon SSM Run command, which is carried out using SSM docs which is in-built into the fault.

It causes CPU chaos on the containers of the ECS task using the given CLUSTER_NAME environment variable for a specific duration.
To select the Task Under Chaos (TUC), use the servie name associated with the task. If you provide the service name along with the cluster name, all the tasks associated with the given service will be selected as chaos targets.
It tests the ECS task sanity (service availability) and recovery of the task containers subject to CPU stress.

Use cases

View details

ECS container IO stress

Back to top

ECS container IO stress disrupts the state of infrastructure resources. It induces stress on the AWS ECS container using Amazon SSM Run command, which is carried out using SSM docs which is in-built into the fault.

It causes I/O stress on the containers of the ECS task using the given CLUSTER_NAME environment variable for a specific duration.
To select the Task Under Chaos (TUC), use the servie name associated with the task. If you provide the service name along with the cluster name, all the tasks associated with the given service will be selected as chaos targets.
It tests the ECS task sanity (service availability) and recovery of the task containers subject to I/O stress.

Use cases

View details

ECS container memory hog

Back to top

ECS container memory hog disrupts the state of infrastructure resources. It induces stress on the AWS ECS container using Amazon SSM Run command, which is carried out using SSM docs which is in-built into the fault.

It causes memory stress on the containers of the ECS task using the given CLUSTER_NAME environment variable for a specific duration.
To select the Task Under Chaos (TUC), use the service name associated with the task. If you provide the service name along with the cluster name, all the tasks associated with the given service will be selected as chaos targets.
It tests the ECS task sanity (service availability) and recovery of the task containers subject to memory stress.

Use cases

View details

ECS container network latency

Back to top

ECS container network latency disrupts the state of infrastructure resources. It brings delay on the AWS ECS container using Amazon SSM Run command, which is carried out using SSM docs which is in-built into the fault.

It causes network stress on the containers of the ECS task using the given CLUSTER_NAME environment variable for a specific duration.
To select the Task Under Chaos (TUC), use the service name associated with the task. If you provide the service name along with the cluster name, all the tasks associated with the given service will be selected as chaos targets.
It tests the ECS task sanity (service availability) and recovery of the task containers subject to network stress.

Use cases

View details

ECS container network loss

Back to top

ECS container network loss disrupts the state of infrastructure resources.

The fault induces chaos on the AWS ECS container using Amazon SSM Run command, which is carried out using SSM docs that comes in-built in the fault.
It causes network disruption on containers of the ECS task in the cluster name.
To select the Task Under Chaos (TUC), use the service name associated with the task. If you provide the service name along with cluster name, all the tasks associated with the given service will be selected as chaos targets.
It tests the ECS task sanity (service availability) and recovery of the task containers subjected to network chaos.

Use cases

View details

ECS instance stop

Back to top

ECS instance stop induces stress on an AWS ECS cluster. It derives the instance under chaos from the ECS cluster.

It causes EC2 instance to stop and get deleted from the ECS cluster for a specific duration.

Use cases

View details

ECS task stop

Back to top

ECS task stop is an AWS fault that injects chaos to stop the ECS tasks based on the services or task replica ID and checks the task availability.

This fault results in the unavailability of the application running on the tasks.

Use cases

View details

ECS container HTTP latency

Back to top

ECS container HTTP latency induces HTTP chaos on containers running in an Amazon ECS (Elastic Container Service) task. This fault introduces latency in the HTTP responses of containers of a specific service using a proxy server, simulating delays in network connectivity or slow responses from the dependent services.

Use cases

View details

ECS container HTTP modify body

Back to top

CS container HTTP modify body injects HTTP chaos which affects the request or response by modifying the status code, body, or headers. This is achieved by starting a proxy server and redirecting the traffic through the proxy server.

Use cases

View details

ECS container HTTP reset peer

Back to top

ECS container HTTP reset peer injects HTTP reset on the service whose port is specified using the TARGET_SERVICE_PORT environment variable.

It stops the outgoing HTTP requests by resetting the TCP connection for the requests.

Use cases

View details

ECS container HTTP status code

Back to top

ECS container HTTP status code injects HTTP chaos that affects the request (or response) by modifying the status code (or the body or the headers) by starting a proxy server and redirecting the traffic through the proxy server on the target ECS containers.

Use cases

View details

ECS invalid container image

Back to top

ECS invalid container image allows you to update the Docker image used by a container in an Amazon ECS (Elastic Container Service) task.

Use cases

View details

ECS network restrict

Back to top

ECS network restrict allows you to restrict the network connectivity of containers in an Amazon ECS (Elastic Container Service) task by modifying the container security rules.

Use cases

View details

ECS update container resource limit

Back to top

ECS update container resource limits allows you to modify the CPU and memory resources of containers in an Amazon ECS (Elastic Container Service) task.

Use cases

View details

ECS update container timeout

Back to top

ECS update container timeout modifies the start and stop timeout for ECS containers in Amazon ECS clusters. It allows you to specify the duration for which the containers should be allowed to start or stop before they are considered as failed.

Use cases

View details

ECS update task role

Back to top

ECS update task role allows you to modify the IAM task role associated with an Amazon ECS (Elastic Container Service) task.

Use cases

View details

Lambda delete event source mapping

Back to top

Lambda delete event source mapping removes the event source mapping from an AWS Lambda function for a specific duration.

It checks the performance of the application (or service) without the event source mapping which may cause missing entries in a database.

Use cases

View details

Lambda toggle event mapping state

Back to top

Lambda toggle event mapping state toggles (or sets) the event source mapping state to disable for a Lambda function during a specific duration.

It checks the performance of the running application (or service) when the event source mapping is not enabled which may cause missing entries in a database.

Use cases

View details

Lambda update function memory

Back to top

Lambda update function memory causes the memory of a Lambda function to be updated to a specified value for a certain duration.

It checks the performance of the application (or service) running with a new memory limit.
It helps determine a safe overall memory limit value for the function.
Smaller the memory limit higher will be the time taken by the Lambda function under load.

Use cases

View details

Lambda update function timeout

Back to top

Lambda update function timeout causes timeout of a Lambda function to be updated to a specified value for a certain duration.

It checks the performance of the application (or service) running with a new timeout.
It also helps determine a safe overall timeout value for the function.

Use cases

View details

Lambda update role permission

Back to top

Lambda update role permission is an AWS fault that modifies the role policies associated with a Lambda function.

It verifies the handling mechanism for function failures.
It can also be used to update the role attached to a Lambda function.
It checks the performance of the running lambda application in case it does not have enough permissions.

Use cases

View details

Lambda delete function concurrency

Back to top

Lambda delete function concurrency is an AWS fault that deletes the Lambda function's reserved concurrency, thereby ensuring that the function has adequate unreserved concurrency to run.

Examines the performance of the running Lambda application, if the Lambda function lacks sufficient concurrency.

Use cases

View details

RDS instance delete

Back to top

RDS instance delete removes an instances from AWS RDS cluster.

This makes the cluster unavailable for a specific duration.
It determines how quickly an application can recover from an unexpected cluster deletion.

Use cases

View details

RDS instance reboot

Back to top

RDS instance reboot can induce an RDS instance reboot chaos on AWS RDS cluster. It derives the instance under chaos from RDS cluster.

Use cases

View details

Windows EC2 blackhole chaos

Back to top

Windows EC2 blackhole chaos results in access loss to the given target hosts or IPs by injecting firewall rules.

Use cases

View details

Windows EC2 CPU hog

Back to top

EC2 windows CPU hog induces CPU stress on the AWS Windows EC2 instances using Amazon SSM Run command.

Use cases

View details

Windows EC2 memory hog

Back to top

Windows EC2 memory hog induces memory stress on the target AWS Windows EC2 instance using Amazon SSM Run command.

Use cases

View details

Introduction​

NLB AZ down​

ECS Fargate Memory Hog​

Resource Access Restrict​

ECS Container Volume Detach​

ECS Fargate CPU Hog​

ECS Fargate Memory Hog​

ALB AZ down​

CLB AZ down​

EBS loss by ID​

EBS loss by tag​

EC2 CPU hog​

EC2 DNS chaos​

EC2 HTTP latency​

EC2 HTTP modify body​

EC2 HTTP modify header​

EC2 HTTP reset peer​

EC2 HTTP status code​

EC2 IO stress​

EC2 memory hog​

EC2 network latency​

EC2 network loss​

EC2 process kill​

EC2 stop by ID​

EC2 stop by tag​

ECS agent stop​

ECS container CPU hog​

ECS container IO stress​

ECS container memory hog​

ECS container network latency​

ECS container network loss​

ECS instance stop​

ECS task stop​

ECS container HTTP latency​

ECS container HTTP modify body​

ECS container HTTP reset peer​

ECS container HTTP status code​

ECS invalid container image​

ECS network restrict​

ECS update container resource limit​

ECS update container timeout​

ECS update task role​

Lambda delete event source mapping​

Lambda toggle event mapping state​

Lambda update function memory​

Lambda update function timeout​

Lambda update role permission​

Lambda delete function concurrency​

RDS instance delete​

RDS instance reboot​

Windows EC2 blackhole chaos​

Windows EC2 CPU hog​

Windows EC2 memory hog​

Introduction

NLB AZ down

ECS Fargate Memory Hog

Resource Access Restrict

ECS Container Volume Detach

ECS Fargate CPU Hog

ECS Fargate Memory Hog

ALB AZ down

CLB AZ down

EBS loss by ID

EBS loss by tag

EC2 CPU hog

EC2 DNS chaos

EC2 HTTP latency

EC2 HTTP modify body

EC2 HTTP modify header

EC2 HTTP reset peer

EC2 HTTP status code

EC2 IO stress

EC2 memory hog

EC2 network latency

EC2 network loss

EC2 process kill

EC2 stop by ID

EC2 stop by tag

ECS agent stop

ECS container CPU hog

ECS container IO stress

ECS container memory hog

ECS container network latency

ECS container network loss

ECS instance stop

ECS task stop

ECS container HTTP latency

ECS container HTTP modify body

ECS container HTTP reset peer

ECS container HTTP status code

ECS invalid container image

ECS network restrict

ECS update container resource limit

ECS update container timeout

ECS update task role

Lambda delete event source mapping

Lambda toggle event mapping state

Lambda update function memory

Lambda update function timeout

Lambda update role permission

Lambda delete function concurrency

RDS instance delete

RDS instance reboot

Windows EC2 blackhole chaos

Windows EC2 CPU hog

Windows EC2 memory hog