Fault Injection Testing with API Gateway

Fault Injection Testing with API Gateway

ยท

7 min read

๐Ÿ’ This blog post describes how an API Gateway like Apache APISIX is useful for testing the robustness and resilience of microservices APIs.

Explore distributed system stability ๐Ÿ’ช

Distributed systems such as microservices have led to an increase in the complexity of the systems we work with. It is difficult to have full confidence in this architecture when there are many components and โ€œa lot of moving partsโ€ that could potentially fail. It is critical to handle failures in service-to-service calls gracefully. Also, we want to be sure that any resilience mechanisms we have in place such as error handling code, circuit breaker, health checks, retry, fallback, redundant instances, and so on. We can verify this with the help of the testing method Fault Injection ๐Ÿ’‰.

Throughout the post, we get to know the types of possible failure injections with the Fault Injection Plugin ๐Ÿ”Œ and simulate failures on our existing Product backend service (developed by using ASP.NET Core WEB API).

Here is a quick overview of what we cover ๐Ÿ‘‡

  • โœ… Software Fault Injection.
  • โœ… Fault injection testing (FIT) with API Gateway.
  • โœ… Apache APISIX Fault Injection Plugin.
  • โœ… Fault injection different types of failures.
  • โœ… Experiment with Fault Injection Plugin.

Application is correct if it acts as specified. It is robust if it can take a high load until it goes down. Application is resilient if it can go back to normal after a disruption.

Software Fault Injection ๐Ÿ’ป๐Ÿ’‰

Among the many methods to perform Fault Injection, the technique of Software Fault Injection is especially getting more popular among companies managing large, complex, and distributed systems. In this software testing technique, a special piece of code associated with the system under test tries to simulate faults. It is usually completed before deployment to identify potential flaws in the running software ๐Ÿ˜ฑ. Fault injection can better identify the nature and cause of production failures.

Fault Injection Testing with API Gateway

The fault injection approach at the API Gateway level can be used to test the resiliency of application or microservices APIs against various forms of failures to build confidence in the production environment. The technique can be used to inject delays and abort requests with user-specified error codes, thereby providing the ability to stage different failure scenarios such as service failures, service overloads, high network latency, network partitions, etc. Fault injection can be limited to a specific set of requests based on the (destination) upstream cluster of a request and/or a set of pre-defined request headers.

For a streaming giant like Netflix, the migration to a complex cloud-based microservices architecture would not have been possible without a revolutionary testing method known as fault injection ๐Ÿ‘Š. There is a very well-known strategy like Chaos engineering which uses fault injection to accomplish the goal of more reliable systems. And Netflix teams built their own Chaos engineering tool called Chaos Monkey.

Apache APISIX Fault Injection Plugin ๐Ÿ”Œ

Apache APISIX Fault Injection Plugin also offers a mechanism to inject some errors into our APIs and ensures that our resilience measures are effective.

Apache APISIX works in two different modes, both configured using the fault-injection plugin attributesโคต๏ธ:

  1. Delays: Delays are timing failures. They simulate increased network latency or an overloaded upstream service.

  2. Aborts: Aborts are crash failures. They mimic failures in upstream services. Aborts usually manifest in the form of HTTP error codes or TCP connection failures.

For detailed instructions on how to configure delays and aborts, see Fault Injection. You can also try out a centralized platform API7 Cloud โ˜๏ธ to use more advanced API Gateway features. API7 Cloud provides a fully managed chaos engineering service with the dashboard to configure the Fault Injection policy easily๐Ÿ‘๐Ÿป.

Experiment with the Fault Injection Plugin ๐Ÿ”ฌ

This part shows you how to inject faults to test the resiliency of your application.

Before you begin ๐Ÿ™…

โ˜๏ธ Familiarize yourself with the fault injection concept. โ˜๏ธ If you followed the previous blog post about Manage .NET Microservices APIs with Apache APISIX API Gateway, make sure you have read it and completed the steps to set up APISIX, etcd and ASP.NET WEB API before continuing with a demo session. Or you can see the complete source code on Github and the instruction on how to build a multi-container APISIX via Docker CLI.

Understand the demo scenario

I assume that you have the demo project apisix-dotnet-docker up and running. In the ASP.NET Core project, there is a simple API to get all products list from the service layer in ProductsController.cs file.

Letโ€™s suppose that we have an online shopping sample application that consists of many microservices such as Catalog, Product, Order and etc. When we are retrieving data about products belonging to a specific catalog, there will be service-to-service interaction between Catalog and Product services. In this case, something might go wrong due to any number of reasons.

FIT with Apache APISIX

To test the shopping applicationโ€™s microservices for resiliency, we are going to simulate the product service misbehaving as a faulty service:

  • By adding a delay to the HTTP request.
  • By aborting the HTTP requests and returning a custom status code.

Injecting an HTTP delay fault

In the first example, we introduce a 5-second delay for every request to the product service to test if we correctly set a connection timeout for calls to the product service from the Catalog service.

Note that you can also specify the percentage of requests to be delayed in numbers. Like 10 means: 10% of overall requests will be delayed. In our demo case, we made it 100% to easily test the delay in time.

The following route configuration example creates a new upstream for our backend service (productapi) that runs on port 80, and registers a route with the fault-injection plugin enabled. You can notice that we set the delay injection in the plugin settings:

curl http://127.0.0.1:9080/apisix/admin/routes/1 \
-H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -X PUT -d '
{
  "name": "Route for Fault Injection with the delay",
  "methods": [
    "GET"
  ],
  "uri": "/api/products",
  "plugins": {
    "fault-injection": {
      "delay": {
        "duration": 5,
        "percentage": 100
      }
    }
  },
  "upstream": {
    "type": "roundrobin",
    "nodes": {
      "productapi:80": 1
    }
  }
}'

Below we confirm the rule was created by running another curl command with the time measurement:

time curl http://127.0.0.1:9080/api/products -i

After you run the cmd, you will see there is some delay was introduced:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive

[{"name":"Macbook Pro","price":1500.9},{"name":"SurfaceBook 3","price":1599.9}]
real    0m5.004s
user    0m0.004s
sys     0m0.000s

The result of fault injection is as we expected.๐Ÿ‘

Injecting an HTTP abort fault

In the following example, we will introduce an HTTP abort to the product microservice to check how our imaginary Catalog service responds immediately to the failures introduced by the dependent service. Letโ€™s say when the Product service fails, we should expect an HTTP error with the Product service currently unavailable error message.

We can test it in action. Now we can enable abort injection with the following route settings.

curl http://127.0.0.1:9080/apisix/admin/routes/1 \
-H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -X PUT -d '
{
  "name": "Route for Fault Injection with the abort",
  "methods": [
    "GET"
  ],
  "uri": "/api/products",
  "plugins": {
    "fault-injection": {
      "abort": {
        "http_status": 503,
        "body": "The product service is currently unavailable.",
        "percentage": 100
      }
    }
  },
  "upstream": {
    "type": "roundrobin",
    "nodes": {
      "productapi:80": 1
    }
  }
}'

If you run curl cmd to hit the APISIX route, now it quickly responds with HTTP 503 error which in turn very comfortable to test catalog service how it reacts to such kind of server errors from downstream services.

curl  http://127.0.0.1:9080/api/products -i

HTTP/1.1 503 Service Temporarily Unavailable
Content-Type: text/plain; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Server: APISIX/2.13.1

With that we can finalize our demo example.

Summary

As we learned, by using the fault injection method, engineers can build better and more stable systems. And open source projects like Apache APISIX make it more accessible for us to some fault injection testing techniques and helps you to plan for unknown failures in the distributed architecture.

โž” Implementing resilient applications with API Gateway (Circuit breaker).

โž” Implementing resilient applications with API Gateway (Health Check).

โž” Watch Video Tutorial:

โž” Read the blog posts:

Communityโคต๏ธ

๐Ÿ™‹ Join the Apache APISIX Community ๐Ÿฆ Follow us on Twitter ๐Ÿ“ Find us on Slack ๐Ÿ“ง Mail to us with your questions.

ย