New Tool Simplifies Microservices Performance Analysis
A visual analytics tool enhances the analysis of microservices performance.
― 7 min read
Table of Contents
Analyzing the Performance of Microservices is a complex job. Microservices are small, independent services that work together, often calling each other to complete a request. Each time a request goes through the system, it might trigger several calls to other services located on different servers or containers. These interactions can make understanding performance difficult.
Current tools for tracing and analyzing microservices mainly use Visualizations called swimlanes. These show individual requests moving through the system, helping to understand their performance. However, they fall short when trying to grasp overall performance trends across the system with many requests.
To address this issue, we present a new visual analytics tool that helps analyze the performance of multiple requests at the same time. The tool offers a variety of interactive visualizations that highlight common request features and how they relate to overall performance.
By testing our tool with data from an established open-source microservices system, we show that it can help identify significant time discrepancies in remote calls that affect performance. Furthermore, it can help uncover meaningful patterns in requests and their connection to microservice performance.
The Challenges of Microservices Performance
Microservices have changed how software is developed and deployed. Each service operates independently, with separate teams managing its lifecycle. This setup allows for faster updates, which is a vital advantage in today’s market. However, it also introduces challenges in maintaining consistent performance.
One major issue is the complexity of these systems. Performance assurance methods, like testing before release, often become hard to implement due to limited time and resources. The pressure to deliver updates quickly can push teams to skip essential performance checks.
Moreover, the performance of microservices can vary based on real-time usage patterns, making it hard to predict performance issues before they occur. Frequent changes to the system and unpredictable workloads can lead to performance regressions that catch teams off guard.
Because of these challenges, the concept of observability has gained interest. Observability allows teams to analyze logs, traces, and metrics to get a complete view of system performance. Distributed tracing tools are widely used today to improve observability in microservices systems. These tools track requests as they move through the system, providing visual assistance for analyzing performance.
However, many current distributed tracing tools have received criticism for not being effective in analyzing overall performance. They often require users to switch between different tools for various analyses. This switch can become cumbersome and time-consuming, making it harder to get a quick understanding of the system's performance patterns.
Introducing the Visual Analytics Tool
Our new tool focuses on simplifying performance analysis for microservices systems. It builds on previous ideas and offers an easy way to see how different request characteristics relate to overall performance.
The tool includes two main visual components: a tree and a histogram. The tree shows how requests flow through the various remote procedure calls (RPCS), while the histogram presents the distribution of Response Times for the requests.
Users can interact with these components to discover the important features of specific execution paths and how they impact performance.
Visual Components Explained
The Tree Visualization
The tree is designed to give a detailed view of the workflows involved in the requests. Each node in the tree represents an RPC call, with edges showing how requests relate to each other.
For example, if a request involves a series of RPCs, it will display in the tree based on the connections between those calls. This tree structure allows for an aggregated view of multiple requests, making it easier to see which paths are frequently used and how they relate to performance.
To highlight RPCs worth investigating, the tool uses color coding to show the variability of execution times and the frequency of calls. Paths with higher variability may indicate a greater impact on overall response time.
The Histogram Component
The histogram displays a standard plot of end-to-end response times to help users see the overall performance picture. It allows for visual identification of performance patterns, such as modes that indicate recurring behaviors.
Users can select ranges in the histogram to focus on specific response times. This selection will update the tree component to show how RPC execution paths relate to those selected times, highlighting any discrepancies in response times based on the selected ranges.
User Interaction with the Tool
The tool supports two types of analysis: forward analysis and backward analysis.
Forward Analysis
In forward analysis, users start with the tree visualization. By examining the nodes, they can identify RPC paths that show significant variability in their execution times or invocation frequency. Users can then click on these nodes to explore how these attributes relate to overall response times.
When a user selects an RPC path, the tool generates a bar chart that displays the range of execution times for that path. This visualization allows users to see how specific execution time behaviors correlate with end-to-end performance, helping to uncover potential performance issues.
Backward Analysis
In backward analysis, users can start from the histogram. They choose a specific range of end-to-end response times they want to analyze, which then updates the tree to show discrepancies in execution time and frequency. Nodes that show significant differences in these attributes compared to others will be highlighted.
Clicking on these highlighted nodes will display histograms that compare the execution times for the selected requests against other requests. This approach makes it easy to see how different execution behaviors are linked to specific response times.
Tool Architecture and Implementation
The underlying architecture of the tool involves several components.
- Trace Collector: It gathers traces from the microservices system and stores them in a Trace Storage.
- Preprocessing Step: This component enhances efficiency and organizes the collected data, preparing it for analysis in the tool.
- Dashboard App: The dashboard connects to the organized data and generates visualizations for users.
The tool supports data formatted from common tracing tools and is built using web technologies to render visualizations.
Evaluating the Tool
To understand how effective the tool is, we conducted a thorough evaluation using datasets derived from a complex microservices system. Our goal was to see if the tool could effectively reveal the relationships between RPC attributes and overall response times.
Dataset Generation
The datasets came from a well-known open-source microservices system that provides a booking service. We generated distinct datasets, each mimicking scenarios that produced different performance characteristics.
We used two main approaches for creating the datasets:
- Injecting Performance Issues: By introducing artificial delays in specific RPCs, we could observe how these changes impacted response times.
- Changing Workloads: We varied loads to simulate different types of user interactions, leading to performance fluctuations.
Manual Analysis
Two authors manually analyzed the datasets using the tool. They focused on how well it helped in understanding the relationship between request attributes and response times. Both authors had prior knowledge of the microservices system but were unaware of the specific changes made to datasets.
Overall, the tool successfully made analyzing requests straightforward in most cases. It effectively highlighted the relationships between execution times and response times across several datasets.
Results from the Evaluation
Through our evaluation, we found that the tool provided valuable insights into the performance of microservices. In the majority of tested datasets, it was easy to identify affected RPCs and their correlations with overall response times.
However, in a couple of datasets, the analysis proved more challenging, requiring more interaction with the tool.
Conclusion
The visual analytics tool presented provides a promising solution for analyzing microservices performance. It simplifies the process of identifying and understanding the relationships between request attributes and overall response times while supporting various analysis types.
Future improvements will focus on enhancing the tool's efficiency and validating it with real-world data from larger microservices systems. The aim is to make this tool a go-to option for teams seeking to improve their microservices performance monitoring and analysis capabilities.
Title: VAMP: Visual Analytics for Microservices Performance
Abstract: Analysis of microservices' performance is a considerably challenging task due to the multifaceted nature of these systems. Each request to a microservices system might raise several Remote Procedure Calls (RPCs) to services deployed on different servers and/or containers. Existing distributed tracing tools leverage swimlane visualizations as the primary means to support performance analysis of microservices. These visualizations are particularly effective when it is needed to investigate individual end-to-end requests' performance behaviors. Still, they are substantially limited when more complex analyses are required, as when understanding the system-wide performance trends is needed. To overcome this limitation, we introduce vamp, an innovative visual analytics tool that enables, at once, the performance analysis of multiple end-to-end requests of a microservices system. Vamp was built around the idea that having a wide set of interactive visualizations facilitates the analyses of the recurrent characteristics of requests and their relation w.r.t. the end-to-end performance behavior. Through an evaluation of 33 datasets from an established open-source microservices system, we demonstrate how vamp aids in identifying RPC execution time deviations with significant impact on end-to-end performance. Additionally, we show that vamp can support in pinpointing meaningful structural patterns in end-to-end requests and their relationship with microservice performance behaviors.
Authors: Luca Traini, Jessica Leone, Giovanni Stilo, Antinisca Di Marco
Last Update: 2024-04-22 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2404.14273
Source PDF: https://arxiv.org/pdf/2404.14273
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.