Network resilience and testing

1. Network resilience and testing

In this section, you explore how to improve the resilience of your services by configuring request timeouts, retries, circuit breakers, and fault injections, as well as using traffic mirroring. A timeout is used to set the amount of time that an envoy proxy should wait for replies from a given service. This ensures that services don't wait for replies indefinitely and that calls succeed or fail within a predictable time frame. If set too long, timeouts cause excessive latency. If set too short, calls may fail unnecessarily. This example specifies a 10-second timeout for calls to the service-b service. VirtualService let you easily adjust timeouts dynamically on a per-service basis without having to edit your service code. Envoy timeouts for HTTP requests are disabled by default. You can specify the maximum number of times an Envoy proxy can attempt to connect to a service with the Retry setting. Retries can improve service availability and application performance by ensuring that temporary errors don't cause permanent call failures. The interval between retries, 25 milliseconds-plus, is variable and determined automatically by Cloud Service Mesh to avoid overwhelming the Cloud Services with requests. By default, HTTP requests retry twice before returning an error. This example configures a maximum of three retries to connect to the service subset after an initial call failure, each with a two-second timeout. Circuit breakers increase the resilience of microservice-based applications. In a circuit breaker, you set call limits to individual hosts within a service, like the amount of concurrent connections or failed calls, to the host. Once the limit has been reached, the circuit breaker trips and stops further connections to that host. Circuit breakers fail quickly and keep clients from trying to connect to an overloaded or failing host. Circuit breaker thresholds are defined in DestinationRules, and settings are applied to each individual host in the service. In this example, the amount of concurrent connections for the review service workloads of the v1 subset is limited to 100. You can use outlier detection settings to detect and evict unhealthy hosts from the load-balancing pool. In this example, if the reviews service returns three consecutive 500 errors in a one-second interval, the service will be ejected for two minutes. Up to 50% of all services can be evicted in this example. Fault injections like delay and abort simulate scenarios like network delays, service overloads, and HTTP errors to test how an application responds to failure. Delay requests are applied before forwarding or simulating failures like network issues or overloaded upstream service. This example introduces a five-second delay in 1 out of every 1,000 requests. The fixedDelay field indicates the amount of delay in seconds. The optional percentage field can be used to only delay a certain percentage of requests. If left unspecified, all requests will be delayed. Delay and abort faults are independent of one another, even if both are specified simultaneously. Traffic mirroring, also known as traffic shadowing, minimizes risk when pushing changes to production. Mirroring sends a copy of live traffic to a mirrored service. The mirrored traffic happens out of band of the critical request path for the primary service. Traffic analysis is another interesting use case. Send a copy of your live traffic to an analytics service to detect network intrusions, discover unauthorized traffic paths, or identify unsecure plaintext traffic. In this example, 100% of the traffic going from v1 to v2 is mirrored.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Manage Scalable Workloads in GKE

AdvancedSkill Level

0.0+

0 reviews

In this introduction, you'll explore the course goals and preview each section.

Exercise 1: Course introduction