introduction.tex 56.8 KB
Newer Older
Jürgen Walter's avatar
Jürgen Walter committed
1
%!TEX root=../../DML.tex
2
This technical report introduces the \acrfull{dml}, a new architecture-level modeling language for modeling Quality-of-Service (QoS) and resource management related aspects of modern dynamic IT systems, infrastructures and services. \gls{dml} is designed to serve as a basis for \emph{self-aware} resource management\footnote{The interpretation of the term "self-aware" is described in detail in Sec.~\ref{Sec:Self-Awareness}}~\cite{KoBrHuRe2010-SCC-Towards,Ko2011-SE-DescartesResearch} during operation ensuring that system quality-of-service requirements are continuously satisfied while infrastructure resources are utilized as efficiently as possible. The term Quality-of-Service (QoS) is used to refer to non-functional system properties including performance (considering classical metrics such as response time, throughput, scalability and efficiency) and dependability (considering in addition: availability, reliability and security aspects). The current version of \gls{dml} is focused on performance and availability including capacity, responsiveness and resource efficiency aspects, however, work is underway to provide support for modeling further QoS properties. The meta-model itself is designed in a generic fashion and is intended to eventually support the full spectrum of QoS properties mentioned above. Given that the initial version of \gls{dml} is focussed on performance, in the rest of this document, we mostly speak of performance instead of QoS in general. Information on the latest developments around the \acrfull{dml} can be found at \url{http://www.descartes.tools}.
Jürgen Walter's avatar
Jürgen Walter committed
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
%architecture-level performance model to describe the service behavior and the resource landscape of modern distributed virtualized data centers. 

%   1. Eingrenzung des Forschungsbereichs (In welchem Themengebiet ist die Arbeit angesiedelt? Wie ist das Verhältnis zum Thema der Konferenz/des Journals?)
%   2. Beschreibung des Problems, das in dieser Arbeit gelöst werden soll (Was ist das Problem und warum ist es wichtig dies zu lösen?)
%   3. Mängel an existierenden Arbeiten bzgl. des Problems (Warum ist es ein Problem, obwohl sich schon andere mit dem gleichen Thema beschäftigt haben?)
%   4. Eigener Loesungsansatz (Welcher Ansatz wurde in dieser Arbeit verwendet, um das Problem zu lösen? Was ist der Beitrag dieses Artikels?)
%   5. Art der Validierung + Ergebnisse (Wie wurde nachgewiesen, dass die Arbeit die versprochenen Verbesserung wirklich vollbringt (Fallstudie, Experiment, o.ä.); Was waren die Ergebnisse der Validierung (idealerweise Prozentsatz der Verbesserung)?)

\section{Motivation}
%   1. Eingrenzung des Forschungsbereichs (In welchem Themengebiet ist die Arbeit angesiedelt? Wie ist das Verhältnis zum Thema der Konferenz/des Journals?)

Modern IT systems have increasingly complex and dynamic architectures composed of loosely-coupled distributed components and services that operate and evolve independently. Managing system resources in such environments to ensure acceptable end-to-end application QoS while at the same time optimizing resource utilization and energy efficiency is a challenge~\cite{durkee2010,lohr2011_nytimes,brooks2011_searchcloudcomputing}. The adoption of virtualization and cloud computing technologies, such as Software-as-a-Service~(SaaS), Platform-as-a-Service~(PaaS) and Infrastructure-as-a-Service~(IaaS), comes at the cost of increased system complexity and dynamicity.

The increased complexity is caused by the introduction of virtual resources and the resulting gap between logical and physical resource allocations. The increased dynamicity is caused by the complex interactions between the applications and workloads sharing the physical infrastructure. The inability to predict such interactions and adapt the system accordingly makes it hard to provide QoS guarantees in terms of availability and responsiveness, as well as resilience to attacks and operational failures~\cite{KoReJoBrBr2011-ResilAssessment}. Moreover, the consolidation of workloads translates into higher utilization of physical resources which makes systems much more vulnerable to threats resulting from unforeseen load fluctuations, hardware failures and network attacks. 

%The increased flexibility and efficiency gained through the adoption of technologies like cloud computing and virtualization, or paradigms like service-oriented architecture, comes at the cost of higher system complexity and dynamicity. This poses some challenges in providing QoS guarantees in terms of availability and performance, as well as resilience to attacks, operational failures and load spikes [A13,A12].

%   2. Beschreibung des Problems, das in dieser Arbeit gelöst werden soll (Was ist das Problem und warum ist es wichtig dies zu lösen?)
% copy from SE 2011 DokSymp Paper and SE Samuels Paper
System administrators and service providers are often faced with questions such as: 
\begin{itemi} 
%COMPACT:
%\item What QoS would a new service or application deployed on the virtualized infrastructure exhibit and how much resources should be allocated to it?
%\item How should the workloads of the new service/application and existing services be partitioned among the available resources so that QoS requirements are satisfied and resources are utilized efficiently?
%\item How should the system configuration (e.g., component deployment, resource allocations) be adapted to avoid QoS issues or inefficient resource usage arising from changing application workloads?
%\item What would be the effect of changing resource allocations and/or migrating a service or an application component from one physical server to another?
%\item What would be the effect of adding a new component or upgrading an existing component?
\item What QoS would a new service or application deployed on the virtualized infrastructure exhibit and how much resources should be allocated to it?
\item How should the workloads of the new service/application and existing services be partitioned among the available resources so that QoS requirements are satisfied and resources are utilized efficiently?
\item What would be the effect of adding a new component or upgrading an existing component as services and applications evolve?
\item If an application experiences a load spike or a change of its workload profile, how would this affect the system QoS? Which parts of the system architecture would require additional resources? %How much resources would be required and how long should they be held after the load decreases? 
\item At what granularity and at what rate should resources be provisioned / released as workloads fluctuate (e.g., CPU time, virtual cores, virtual machines, physical servers, clusters, data centers)?
\item What would be the effect of\forget{ changing resource allocations and/or} migrating a service or an application component from one physical server to another?
\item How should the system configuration (e.g., component deployment, resource allocations) be adapted to avoid inefficient system operation arising from evolving application workloads?
\end{itemi}
\forget{How much resources need to be allocated to ensure that both the new service and existing services satisfy their performance requirements?} 

Answering such questions requires the ability to predict at \emph{run-time} how the QoS of running applications and services would be affected if application workloads change and/or the system deployment and configuration is changed. We refer to this as \emph{online QoS prediction}. Given that the initial version of \gls{dml} is focussed on performance, hereafter we will speak of \emph{online performance prediction}~\cite{KoBrHuRe2010-SCC-Towards,Ko2011-SE-DescartesResearch}. 

\begin{figure}[!htb]
  \centering
  \includegraphics[width=0.80\linewidth]{SOA-Env.pdf}
  \caption{Degrees-of-Freedom and performance-influencing factors in a modern IT system.}
  \label{Fig:SOA-Env}
\end{figure}

Predicting the performance of a modern IT system, however, even in an offline scenario is a challenging task. Consider the architecture of a typical modern IT system as depicted in Figure~\ref{Fig:SOA-Env}.\shorten{ A SOA application comprises a set of \emph{services} each implementing a specific business activity. Services are accessed according to specified workflows representing business processes. Each service is implemented using a set of software components deployed in one or more application servers. The application servers are often deployed in a virtualized environment. A physical machine may host multiple VMs managed by a hypervisor. Depending on the role of a given VM, the software it runs can further be divided into multiple layers including operating system, Java virtual machine and application server middleware.\shorten{ The situation is further complicated by the fact that different services may run on heterogeneous platforms which means that platform-specific techniques cannot be used for end-to-end performance management.}} For a given set of hardware and software platforms at each layer of the architecture, Figure~\ref{Fig:SOA-Env} shows some examples of the degrees-of-freedom at each layer and the factors that may affect the performance of the system. Predicting the performance of a service\shorten{ (e.g., the time required to execute a service workflow)} requires taking these factors into account as well as the dependencies among them. For example, the input parameters passed to a service may have direct impact on the set of software components involved in executing the service, as well as their internal behavior (e.g., flow of control, number of loop iterations, return parameters) and resource demands (e.g., CPU, disk and network service times).\Shorten{ Consider for instance an online translation service. The time needed to process a translation request and the specific system components involved would depend on the size of the document passed as input, the format in which the document is provided, as well as the source and target languages. Thus, in order to predict the service response time, the effects of input parameters have to be traced through the complex chain of components and resources involved. Moreover, the configuration parameters at the different layers of the execution environment, as well as resource contention due to concurrently executed requests, must be taken into account.} Therefore, a detailed \emph{performance model} capturing the performance-relevant aspects of both the software architecture and the multi-layered execution environment is needed. 

%   3. Mängel an existierenden Arbeiten bzgl. des Problems (Warum ist es ein Problem, obwohl sich schon andere mit dem gleichen Thema beschäftigt haben?)
Existing approaches to online performance prediction (e.g., \cite{MeAl2000-Scaling_for_eBusiness, nou2008a, LiChinneckWoodside2009Performanc, jung2010a}) are based on stochastic performance models such as queueing networks, stochastic petri nets and variants thereof, e.g., layered queueing networks or queueing petri nets. Such models, often referred to as \emph{predictive} performance models, normally abstract the system at a high level without explicitly taking into account its software architecture (e.g., flow of control and dependencies between software components), its execution environment and configuration (e.g., resource allocations at the virtualization layer). Services are typically modeled as black boxes and many restrictive assumptions are often imposed such as a single workload class, single-threaded components, homogeneous servers or exponential request inter-arrival times. Detailed models that explicitly capture the software architecture, execution environment and configuration exist in the literature, however, such models are intended for offline use at system design time (e.g., \cite{becker2008a, GrMiSa2007-KLAPER,SmLlCoDiWi2005-XML_based_SPE_with_SPMIF, omg2006-UML_MARTE}). Models in this area are \emph{descriptive} in nature, e.g., software architecture models based on UML, annotated with descriptions of the system's performance-relevant behavior. Such models, often referred to as \emph{architecture-level} performance models, are built during system development and are used at design and/or deployment time to evaluate alternative system designs and/or predict the system performance for capacity planning purposes.

While architecture-level performance models provide a powerful tool for performance prediction, they are typically expensive to build and provide limited support for reusability and customization which renders them impractical for use at run-time. Recent efforts in the area of \emph{component-based performance engineering}~\cite{koziolek2009a} have contributed a lot to facilitate model reusability, however, there is still much work to be done on further parameterizing performance models before they can be used for online performance prediction.\forget{ In particular, current techniques do not provide means to model the layers of the component execution environment (e.g., the virtualization layer) explicitly~\cite{1574296}. The performance influences of the individual layers, the dependencies among them and the resource allocations at each layer should be captured as part of the models. This is necessary in order to be able to predict at run-time how a change in the execution environment (e.g., modifying resource allocations at the VM level) would affect the system performance.}

\section{Design-time vs. Run-Time Models}
\label{chap:introduction:sec:designvsruntime}

We argue that there are some fundamental differences between offline and online scenarios for performance prediction leading to different requirements on the underlying performance abstractions of the system architecture and the respective performance prediction techniques suitable for use at design-time vs. run-time. In the following, we summarize the main differences in terms of goals and underlying assumptions driving the evolution of design-time vs. run-time models.

\paragraph{Goal: Evaluate Design Alternatives vs. Evaluate Impact of Dynamic Changes}
At system design-time, the main goal of performance modeling and prediction is to evaluate and compare different design alternatives in terms of their performance properties.

In contrast, at run-time, the system design (i.e., architecture) is relatively stable and the main goal of online performance prediction is to predict the impact of dynamic changes in the environment (e.g., changing workloads, system deployment, resource allocations, deployment of new services).

\paragraph{Model Structure Aligned with Developer Roles vs. System Layers}

Given the goal to evaluate and compare different design alternatives, design-time models are typically structured around the various developer roles involved in the software development process (e.g., component developer, system architect, system deployer, domain expert), i.e., a separate sub-meta-model is defined for each role. In line with the component-based software engineering paradigm, the assumption is that each developer with a given role can work independently from other developers and does not have to understand the details of sub-meta-models that are outside of their domain, i.e., there is a clear separation of concerns. Sub-meta-models are parameterized with explicitly defined interfaces to capture their context dependencies. Performance prediction is performed by composing the various sub-meta-models involved in a given system design. To summarize, at design-time, model composition and parameterization is aligned with the software development processes and developer roles.

At run-time, the complete system now exists and a strict separation and encapsulation of concerns according to the developer roles is no longer that relevant. However, given the dynamics of modern systems, it is more relevant to be able to distinguish between static and dynamic parts of the models. The software architecture is usually stable, however, the system configuration (e.g., deployment, resource allocations) at the various layers of the execution environment (virtualization, middleware) may change frequently during operation. Thus, in this setting, it is more important to explicitly distinguish between the system layers and their dynamic deployment and configuration aspects, as opposed to distinguishing between the developer roles. Given that performance prediction is typically done to predict the impact of dynamic system adaptation, models should be structured around the system layers and parameterized according to their dynamic adaptation aspects.

\paragraph{Type and Amount of Data Available for Model Parameterization and Calibration}

74
Performance models typically have multiple parameters such as workload profile parameters (workload mix and workload intensity), resource demands, branch probabilities and loop iteration frequencies. The type and amount of data available as a basis for model parameterization and calibration at design-time vs. run-time greatly differs.\medskip
Jürgen Walter's avatar
Jürgen Walter committed
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
At design-time, model parameters are often estimated based on analytical models or measurements if implementations of the system components exist. One the one hand, there is more flexibility since in a controlled testing environment, one could conduct arbitrary experiments under different settings to evaluate parameter dependencies. On the other hand, possibilities for experimentation are limited since often not all system components are implemented yet, or some of them might only be available as a prototype. Moreover, even if stable implementations exist, measurements are conducted in a testing environment that is usually much smaller and may differ significantly from the target production environment. Thus, while at design-time, one has complete flexibility to run experiments, parameter estimation is limited by the inavailability of a realistic production-like testing environment and the typical lack of complete implementations of all system components.

At run-time, all system components are implemented and deployed in the target production environment. This makes it possible to obtain much more accurate estimates of the various model parameters taking into account the real execution environment. Moreover, model parameters can be continuously calibrated to iteratively refine their accuracy. Furthermore, performance-relevant information can be monitored and described at the component instance level, not only at the type level as typical for design-time models. However, during operation, we don't have the possibility to run arbitrary experiments since the system is in production and is used by real customers placing requests. In such a setting, monitoring has to be handled with care, keeping the monitoring overhead within limits (non-intrusive approach) such that system operation is not disturbed. Thus, at run-time, while theoretically much more accurate estimates of model parameters can be obtained, one has less control over the system to run experiments and monitoring must be performed with care in a non-intrusive manner.

\paragraph{Trade-off Between Prediction Accuracy and Overhead}

Normally, the same model can be analyzed (solved) using multiple alternative techniques such as exact analytical techniques, numerical approximation techniques, simulation and bounding techniques. Different techniques offer different trade-offs between the accuracy of the provided results and the overhead for the analysis in terms of elapsed time and computational resources.

At design-time, there is normally plenty of time to analyze (solve) the model. Therefore, one can afford to run detailed time-intensive simulations providing accurate results.

At run-time, depending on the scenario, the model may have to be solved within seconds, minutes, hours, or days. Therefore, flexibility in trading-off between accuracy and overhead is crucially important. The same model is typically used in multiple different scenarios with different requirements for prediction accuracy and analysis overhead. Thus, run-time models must be designed to support multiple abstraction levels and different analysis techniques to provide maximum flexibility at run-time.

\paragraph{Degrees-of-Freedom}

The degrees-of-freedom when considering multiple design alternatives at system design-time are much different from the degrees-of-freedom when considering dynamic system changes at run-time such as changing workloads or resource allocations.

At design-time one virtually has infinite time to vary the system architecture and consider different designs and configurations. At run-time, the time available for optimization is normally limited and the concrete scenarios considered are driven by the possible dynamic changes and available reconfiguration options. Whereas the system designer is free to design an architecture that suits his requirements, at run-time the boundaries within which the system can be reconfigured are much stricter. For example, the software architecture defines the extent to which the software components can be reconfigured or the hardware environment may limit the deployment possibilities for virtual machines or services. Thus, in addition to the performance influencing factors, run-time models should also capture the available system reconfiguration options and adaptations strategies.

\paragraph{Design for Use by Humans vs. Machines}

Design-time models are normally designed to be used by humans. They also serve as architecture documentation, i.e., they should be easy to understand and model instances should be valid and meaningful.

In contrast, run-time models are typically used for optimizing the system configuration and deployment as part of autonomic run-time resource management techniques. In this case, models are used by programs or agents as opposed to humans. Ideally, models should be composed automatically at run-time and tailored to the specific prediction scenario taking into account timing constraints and requirements concerning accuracy. Also, ideally, models will be hidden behind the scenes and no users or administrators will ever have to deal with them. Although, in many cases the initial sub-meta-models capturing the performance-relevant aspects of the various system layers would have to be constructed manually, novel automated model inference techniques increasingly enable the extraction of sub-meta-models in an automatic or semi-automatic manner.

\section{The Descartes Modeling Language (DML)}

The fundamental goal of \gls{dml} is to provide a holistic model-based approach that can be used to describe the performance behavior and properties of the system as well as to model the system's dynamic aspects like its configuration space and adaptation processes.
The intention is that, using the online performance prediction techniques provided by \cite{Brosig2014-Dissertation}, \gls{dml} can support system analysis and problem detection as well as autonomic decision-making.
% iii) the specification of adaptation processes at the model level.
Furthermore, by providing means to specify adaptation processes at the model level, \gls{dml} can be used to find suitable system configurations without having to adapt the actual system.
% However, these requirements lead to two different concerns that must be addressed by \gls{dml}.
% First, \gls{dml} has to reflect the performance behavior of the managed system.
% Second, it must be suitable to describe the adaptation process of the system.
% Thus, the important question is how to separate these concerns (cf.~\cite{France2007}).
In the following section, we give an overview of the different sub-models of \gls{dml} before its features are explained in detail in \Cref{chap:AppLevelAndResLandscape} to \Cref{chap:SysReconfig}.


\subsection{Modeling Language Overview}
\label{sec:BriefDmmOverview}

The \acrfull{dml} is a novel architecture-level modeling language to describe \gls{qos} and resource management related aspects of modern dynamic IT systems, infrastructures and services.
\gls{dml} explicitly distinguishes different model types that describe the system and its adaptation processes from a \textit{technical} and a \textit{logical} viewpoint.
Together, these different model types form a \gls{dml} instance (cf.~\Cref{fig:e2e_modeling_overview}).
The idea of using separate models is to separate knowledge about the system architecture and its performance behavior (technical aspects) from knowledge about the system's adaptation processes (logical aspects). 

\begin{figure}[htb]
	\centering
Simon Eismann's avatar
Simon Eismann committed
122
	\includegraphics[width=\textwidth]{e2e-modeling}
Jürgen Walter's avatar
Jürgen Walter committed
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
	\caption[Relation of the different models of a \gls{dml} instance]{Relation of the different models of a \gls{dml} instance and the system.}
	\label{fig:e2e_modeling_overview}
\end{figure}

\Cref{fig:e2e_modeling_overview} depicts an overview of the relation of the different models that are part of a \gls{dml} instance, the managed system, and the system's adaptation process.
In the bottom right corner of \Cref{fig:e2e_modeling_overview}, we see the system that is managed by a given, usually system-specific, adaptation process, depicted in the top right corner of \Cref{fig:e2e_modeling_overview}.
In the bottom left corner, we see the models that reflect the technical aspects of the system relevant for model-based performance and resource management.
These aspects are the hardware resources and their distribution (resource landscape model), the software components and their performance-relevant behavior (application architecture model), the deployment of the software components on the hardware (deployment model), the usage behavior and workload of the users of the system (usage profile model), and the degrees of freedom of the system that can be employed for run-time system adaptation (adaptation points model).
On top of these models (top left corner of \Cref{fig:e2e_modeling_overview}), we see the adaptation process model that specifies an adaptation process describing how to adapt the managed system.
The adaptation process leverages online performance prediction techniques to reason about possible adaptation strategies, tactics, and actions.

In the following paragraphs we give brief overviews of the features of each meta-model. 

% This section should provide a high-level overview of the DML covering its organization, structure and main distinguishing features and benefits. It should focus on what is provided while trying to avoid to repeat the general motivation as presented in the previous section: The style of presentation should be: "DML provides x,y,z which allows to do x,y,z and provides the following unique benefits x,y,z not covered by existing modeling approaches" as opposed to "x,y,z is needed because of x,y,z". 
% You can first walk the reader through the meta-model and then at the end of the section summarize the features supported by DML stressing its novel and distinguishing aspects.}
\paragraph{Application Architecture Model} 
This model is focused on the application architecture of the managed system.
For performance analysis, this model must capture performance-relevant information about the software services that are executed on the system as well as external services used by the system.
In general, this model is focused on describing the performance behavior of the software services after the principles of component-based software systems \cite{becker2008a}. %, i.e., we want to describe the performance behavior of the software system's implementation.
A software component is defined as a unit of composition with explicitly defined provided and required interfaces \cite{szyperski2002a}.
The performance behavior of each software component can be described independently and at different levels of granularity.
The supported levels of granularity range from black-box abstractions (a probabilistic representation of the service response time behavior), over coarse-grained representations (capturing the service behavior as observed from the outside at the component boundaries, e.g., frequencies of external service calls and amount of consumed resources), to fine-grained representations (capturing the service's internal control flow and internal resource demands).
The advantage of the support for multiple abstraction levels is that the model is usable in different online performance prediction scenarios with different goals and constraints, ranging from quick performance bounds analysis to detailed system simulation. Moreover, one can select an appropriate abstraction level to match the granularity of information that can be obtained through monitoring tools at run-time, e.g., considering to what extent component-internal information can be obtained by the available tools.

\paragraph{Resource Landscape Model}
The purpose of this model is to describe the structure and the properties of both physical and logical resources of modern distributed IT service infrastructures. 
Therefore, the resource landscape model provides modeling abstractions to specify the available physical resources (\acrshort{cpu}, network, \acrshort{hdd}, memory) as well as their distribution within data centers (servers, racks, and so on).
To specify the logical resources, the resource landscape model also supports modeling different layers of resources and specifying the performance influences of the configuration of these layers.
In this context, resource layers denote the software stack on which software is executed, including virtualization, operating system, middleware, and runtime environments (e.g., \acrshort{jvm}).
In addition, as we also consider systems distributed over multiple data centers, the model also captures the distribution of resources across data centers.
Modeling the structure and properties of data center resources at this level of detail is important for accurate performance predictions and to derive causal relationships of the performance impact during system adaptation.

\paragraph{Deployment Model}
To analyze the performance of the modeled system, it is necessary to connect the modeled software components with the system resources described using the resource landscape model.
The deployment model provides this information by mapping the software components modeled in the application architecture model to physical or logical resources described in the resource landscape model.
With this mapping, resource demands of the modeled software components can be traced through the layers of the resource landscape model down to the physical resources.
% These dependencies are resolved during online performance prediction, when the architecture-level performance model is transformed to a predictive performance model \citep[cf.][]{Brosig2014-Dissertation}.
Thereby, it is possible to analyze mutual performance influences when sharing resources.

\paragraph{Usage Profile Model}
An important aspect that influences the performance of a system is the way the system is used. 
For instance, if the amount of user requests that have to be processed by the system increases, more resources would normally be required to process the increased amount of work.
The usage profile model can be used to describe the types of requests that are processed by the system and the frequency with which new requests arrive.
In fact, the usage profile is a frequently changing property of the system environment to which we want to adapt the system proactively.

\paragraph{Adaptation Points Model}
This model provides modeling abstractions to describe the elements of the resource landscape and the application architecture that can be leveraged for adaptation (i.e., reconfiguration) at run-time.
Other model elements that may change at run-time but cannot be directly controlled (e.g., the usage profile), are not in the focus of this model.
Adaptation points on the model level correspond to operations that can be executed on the system at run-time to adapt the system (e.g., adding \glspl{vcpu} to \acrshortpl{vm}, migrating \acrshortpl{vm} or software components, or load-balancing requests).
Thus, the adaptation points model defines the configuration space of the adapted system.
The model provides constructs to specify the degrees of freedom along which the system's state can vary as well as to define boundaries for the valid system states.

\paragraph{Adaptation Process Model}

This model can be used to describe processes that keep the system in a state such that its operational goals are continuously fulfilled, i.e., it describes the way the system adapts to changes in its environment.
It is based on the previously introduced architecture-level performance model and adaptation points model which are used to describe adaptation processes at the model level.
With this model, we aim at abstracting from technical details such that we can describe adaptation processes from a logical perspective, independent of system-specific details.
It is designed to provide sufficient flexibility to model a large variety of adaptation processes from event-condition-action rules to complex algorithms and heuristics.
Essentially, it distinguishes high-level goal-oriented objectives, adaptation strategies and tactics, from low-level system-specific adaptation actions. 
The modeling language also provides concepts to describe the operational goals of the managed system such that the adaptation process can be driven towards these goals.


\subsection{Summary of Supported Features and Novel Aspects}
\label{sec:FeaturesAndNovelAspectsSummary}

%Possibly expand on the possible application areas:
%Platform selection: Determine which hardware and software platforms would provide the best scalability and cost/performance ratio?
%Platform validation: Validate a selected combination of platforms to ensure that taken together they provide adequate performance and scalability. 
%Evaluation of design alternatives: Evaluate the relative performance, scalability and costs of alternative system designs and architectures.
%Performance prediction: Predict the performance of the system for a given workload and configuration scenario.
%Performance tuning: Analyze the effect of various deployment settings and tuning parameters on the system performance and find their optimal values
%Performance optimization: Find the components with the largest effect on performance and study the performance gains from optimizing them.
%Scalability and bottleneck analysis: Study the performance of the system as the load increases and more hardware is added. Find which system components are most utilized and investigate if they are potential bottlenecks.
%Sizing and capacity planning: Determine how much hardware resources are required to guarantee certain performance levels.
%Run-time performance and power management: Determine how to vary resource allocations during operation in order to ensure that performance requirements are continuously satisfied while optimizing power consumption in the face of frequent variations in service workloads.

The \acrfull{dml} provides a new architecture-level modeling language for modeling quality-of-service and resource management related aspects of modern dynamic IT systems, infrastructures and services. \gls{dml} models can be used both in offline and online settings spanning the whole lifecycle of an IT system. In an offline setting the increased flexibility provided by \gls{dml} can be exploited for system sizing and capacity planning as well as for evaluating alternative system architectures or target deployment platforms. It can also be used to predict the effect of changes in the system architecture, deployment and configuration as services and applications evolve. In an online setting, \gls{dml} provides the basis for \emph{self-aware} resource management during operation ensuring that system quality-of-service requirements are continuously satisfied while infrastructure resources are utilized as efficiently as possible. 

From the scientific perspective, the key features of \gls{dml} are: 
i)~a domain-specific language designed for modeling the performance-relevant behavior of services in dynamic environments, 
ii)~a modeling approach to characterize parameter and context dependencies based on online monitoring statistics, 
iii)~a domain-specific language to model the distributed and dynamic resource landscape of modern data centers capturing the properties relevant for performance and resource management, 
iv)~an adaptation points meta-model for annotating system architecture QoS models to describe the valid configuration space of the modeled dynamic system.  
v)~a modeling language to describe system adaptation strategies and heuristics independent of the system-specific details.


\subsection{Application Scenarios}
\label{sec:applicationscenarios}
The developed performance modeling and prediction approach has been designed to be applicable in different scenarios. As mentioned above, while the major application of \gls{dml} is to serve as a basis for engineering self-aware software systems (Sec.~\ref{Sec:Self-Awareness}), here in this subsection, we provide an overview of more fine-grained application areas.

\paragraph{Online Capacity Planning}
% adapted from Chris, e.g., the questions are almost copied
Enterprise software systems should be scalable and provide the flexibility to handle different workloads. Classical performance analysis would require costly and time-consuming load testing for evaluating the system performance in different deployments. \gls{dml} enables performance engineers and system administrators to evaluate the system performance in heterogeneous hardware environments and to compare different deployment sizes in terms of their performance and efficiency. Given that model parameters are characterized using representative monitoring data collected at run-time, the prediction results exhibit higher accuracy than predictions obtained through design-time modeling approaches. The developed techniques help to answer the following questions that arise frequently during operation:
\begin{itemize}
\item What would be the average utilization of system components and the average service response times for a given workload and deployment scenario?
\item How many servers are needed to ensure adequate performance under the expected workload?
\item How much would the system performance improve if a given server is upgraded?
\end{itemize}

\paragraph{Impact Analysis of Workload Changes}
% adapted from Chris, e.g., the questions are almost copied
In general, the workload intensity of enterprise software systems varies over time. The workload intensity may follow certain trends or patterns, e.g., a weekly pattern with low intensity over the weekend. In addition, there can be situations where it is foreseeable that the workload will double within the next month.
Using workload forecasting approaches developed in \cite{HuKoAm2013-CCPE-WorkloadClassificationAndForecasting,HeHuKoAm2013-ICPE-WorkloadClassificationAndForecasting}, it is possible to forecast future workload intensity trends. Based on the latter, our approach allows performance engineers and system administrators to anticipate performance problems. System behavior and performance can be easily evaluated for different workloads. In contrast to performance tests, the model-based approach allows evaluating the system without setting up a representative testbed. The predictions allow both determining the maximal system throughput as well as detecting potential bottlenecks. The questions that arise in this scenario are:
\begin{itemize}
\item What maximum load level can the system sustain for a given resource allocation?
\item How does the system behave for the anticipated workload behavior?
\item Which component or resource is a potential bottleneck for a certain workload scenario?
\end{itemize}

\paragraph{Impact Analysis of Service Recompositions and Reconfigurations as well as System Adaptations}
Today's enterprise software systems running on modern application platforms allow performing comprehensive online reconfigurations and adaptations, without service disruption. Applications can be customized, new services can be composed and deployed on-the-fly, service configuration parameters can be changed. To provide an illustrative example, assume the default setting of the \emph{rowsPerPage} parameter of a frequently accessed list view is changed, e.g., doubled from 25 to 50. The impact of such a reconfiguration may have a severe impact on the database server or application server utilization and/or a significant influence on the end-to-end service response times. With our approach to capturing probabilistic parameter dependencies, the impact of such a reconfiguration can be assessed in advance without conducting performance tests in a representative testbed. Questions that can be answered using \gls{dml} are:
\begin{itemize}
\item How does the system behave if a new service is deployed?	
\item What is the performance impact of changing a certain configuration parameter?
\item Does a service re-composition improve the perceived service response time?
\item What would be the performance impact of changing a third party external service provider?
\end{itemize}	

\paragraph{Autonomic Resource Management at Run-time}
\gls{dml} provides a basis for developing model-based \emph{autonomic} performance and resource management techniques that proactively adapt the system to dynamic changes at run-time with the goal to satisfy performance objectives while at the same time ensuring efficient resource utilization.

State-of-the-art industrial mechanisms for automated performance and resource management generally follow a trigger-based approach when it comes to enforcing application-level \glspl{sla} concerning availability or responsiveness. Custom triggers can be configured that fire in a reactive manner when an observed metric reaches a certain threshold (e.g., high server utilization or long service response times) and execute certain predefined reconfiguration actions until a given stopping criterion is fulfilled (e.g., response times drop)~\cite{VMware-DRS,AmazonEC2-AutoScaling}. 
However, application-level metrics, such as availability and responsiveness, normally exhibit a highly non-linear behavior on system load and they typically depend on the behavior of multiple servers across several application tiers. 
Hence, it is hard to determine general thresholds of when triggers should be fired given that the appropriate triggering points are typically highly dependent on the architecture of the hosted services and their workload profiles, which can change frequently during operation. The inability to anticipate and predict the effect of dynamic changes in the environment, as well as to predict the effect of possible adaptation actions, renders conventional trigger-based approaches unable to reliably enforce \glspl{sla} in an efficient and proactive fashion.

\begin{figure}[htbp]
  \centering
  \includegraphics[width=0.9\linewidth]{niko-control_loop}
  \caption{Model-Based System Adaptation Control Loop~\cite{Huber2014-Dissertation}}
  \label{fig:niko-control_loop}
\end{figure}

To overcome the mentioned shortcomings of current industrial approaches, \cite{Huber2014-Dissertation} developed a framework for autonomic performance-aware resource management. Figure~\ref{fig:niko-control_loop} shows the control loop that is central to that framework. It consists of four main phases \emph{Monitor}, \emph{Analyze}, \emph{Plan} and \emph{Execute}. In addition, the figure depicts a \emph{Knowledge Base} that is used by all mentioned phases. The knowledge base is realized with \gls{dml}. 
%
\gls{dml} is used to conduct performance predictions on the model level to anticipate performance problems and to find suitable adaptation actions.
Given that \gls{dml} supports detailed impact analyses, e.g., workload intensity and usage profile changes, service (re-)compositions or deployment changes, the adaptation mechanisms can quickly converge to an efficient target system configuration~\cite{HuHoKoBrKo2013-SOCA-ModelingRuntimeAdaptation}. The tailored prediction process allows the adaptation mechanism to trigger predictions for multiple different configuration scenarios within a controllable period of time. The prediction results are sufficiently accurate since the models are maintained up-to-date based on representative monitoring data obtained at run-time. 

\cite{Huber2014-Dissertation} evaluates the framework end-to-end in two different representative case studies, demonstrating that it can provide significant efficiency gains of up to 50\% without sacrificing performance guarantees, and that it is able to trade-off performance requirements of different customers in heterogeneous hardware environments. Furthermore, it is shown that the approach enables proactive system adaptation, reducing the amount of SLA violations by 60\% compared to a trigger-based approach. The results of the case studies in \cite{Huber2014-Dissertation} show that it is possible to apply architecture-level performance models and online performance prediction to perform autonomic system adaptation on the model level such that the system's operational goals are maintained. Different adaptation possibilities can be assessed without having to change the actual system.


\section{Self-Aware Computing Systems}
\label{Sec:Self-Awareness}
As mentioned above, a major application of the \acrfull{dml} is to serve as a basis for \emph{self-aware} resource management during operation\shorten{ ensuring that system quality-of-service requirements are continuously satisfied while infrastructure resources are utilized as efficiently as possible}. Self-aware computing systems are best understood as a sub-class of autonomic computing systems. In this section, we explain in more detail what exactly is meant by \emph{self-awareness} in this context.

267
\gls{dml} is a major part of our broader long-term research effort\footnote{\url{http://www.descartes.tools}} aimed at developing novel methods, techniques and tools for the engineering of \emph{self-aware} computing systems~\cite{KoBrHuRe2010-SCC-Towards,Ko2011-SE-DescartesResearch}. The latter are designed with built-in online QoS prediction and self-adaptation capabilities used to enforce QoS requirements in a cost- and energy-efficient manner. Self-awareness in this context is defined by the combination of three properties that a system should possess:
Jürgen Walter's avatar
Jürgen Walter committed
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
\begin{enum}
  \item \mbox{\emph{Self-reflective}:} Aware of its software architecture, execution environment, and hardware infrastructure on which it is running as well as of its operational goals (e.g., QoS requirements, cost- and energy-efficiency targets),
  \item \mbox{\emph{Self-predictive}:} Able to predict the effect of dynamic changes (e.g., changing service workloads) as well as predict the effect of possible adaptation actions (e.g., changing system configuration, adding/removing resources),
  \item \mbox{\emph{Self-adaptive}:} Proactively adapting as the environment evolves in order to ensure that its operational goals are continuously met.
\end{enum}

%\subsubsection{Approach}

The \acrfull{dml} is designed with the goal to provide modeling abstractions to capture and express the system architecture aspects whose knowledge is required at run-time to realize the above three properties. A major goal of these abstractions is to provide a balance between model expressiveness, flexibility and compactness. Instances of the various parts of the meta-model are intended to serve as \emph{online models} integrated into the system components they represent and reflecting all aspects relevant to managing their QoS and resource efficiency during operation. \SHORTEN{In contrast to black-box models, \gls{dml} provides constructs to capture all relevant static and dynamic aspects of the underlying software architecture, execution environment, hardware infrastructure, and application workload profiles.}

In parallel to this, we are working on novel application platforms designed to automatically maintain online models during operation to reflect the evolving system environment. The online models are intended to serve as a ``mind'' to the running system controlling its behavior at run-time, i.e., deployment configurations, resource allocations and scheduling decisions. To facilitate the initial model construction and continuous maintenance during operation, we are working on techniques for automatic model extraction based on monitoring data collected at run-time~\cite{BrHuKo2011-ASE-AutomExtraction,KoBeBrHuOk2011-SIMUTools-DataFabrics,HuQuHaKo2011-CLOSER-ModelVirtOverhead}.
 
The online system models make it possible to answer QoS-related queries during operation such as for example:\forget{ the ones discussed in the beginning of Sec.~\ref{Sec:Intro}, e.g.,} \shorten{ What QoS would a new application deployed on the virtualized infrastructure exhibit and how much resources should be allocated to it?\forget{How much resources need to be allocated to ensure that both the new service and existing services satisfy their performance requirements? How should the workloads of the new service and existing services be partitioned among the available resources so that performance requirements are satisfied and resources are utilized efficiently?} What would be the effect of migrating an application from one virtual machine~(VM) to another?} What would be the effect on the QoS of running applications and on the resource consumption of the infrastructure if a new service is deployed in the virtualized environment or an existing service is migrated from one server to another? How much resources need to be allocated to a newly deployed service to ensure that SLAs are satisfied while maximizing energy efficiency? What QoS would a service exhibit after a period of time if the workload continues to develop according to the current trends? How should the system configuration be adapted to avoid QoS problems or inefficient resource usage arising from changing customer workloads?\shorten{ such as the ones mentioned in Sect.~\ref{Sec:Intro}.} What operating costs does a service hosted on the infrastructure incur and how does the service workload and usage profile impact the costs? We refer to such queries as \emph{online QoS queries}.   

%\shorten{  
%\begin{figure}[!htb]
%  \centering
%  \includegraphics[width=0.60\linewidth]{OnlinePerfPrediction}
%  \caption{Online QoS Prediction Process}
%  \label{Fig:OnlinePerfPredProcess}
%\end{figure}

%Figure~\ref{Fig:OnlinePerfPredProcess} illustrates the process that will be followed in order to provide an answer to a query.\Shorten{ First, the QoS models of all involved system components will be retrieved and combined by means of model composition techniques into a single architecture-level QoS model encapsulating all information relevant to answering the QoS query. This model will then be transformed into a predictive QoS model by means of an automatic \emph{model-to-model transformation}~[D32]. The target predictive model type and level of abstraction as well as the solution technique will be determined on-the-fly based on the required accuracy and the time available for the analysis. Multiple model types and model solution techniques will be used in order to provide flexibility in trading-off between prediction accuracy and analysis overhead.} 

%\begin{figure}[!htb]
%  \centering
%  \includegraphics[width=0.50\linewidth]{ReconfigProcess}
%  \caption{Online Reconfiguration Process}
%  \label{Fig:ReconfigProcess}
%\end{figure}
%}

The ability to answer online QoS queries during operation provides the basis for implementing techniques for self-aware QoS and resource management. Such techniques are triggered automatically during operation in response to observed or forecast changes in the environment (e.g., varying application workloads). The goal is to \emph{proactively} adapt the system to such changes in order to avoid anticipated QoS problems and/or inefficient resource usage\forget{ and/or high system operating costs}. The adaptation is performed in an autonomic fashion by considering a set of possible system reconfiguration scenarios (e.g, changing virtual machine placement and/or changing resource allocations) and exploiting the online QoS query mechanism to predict the effect of such reconfigurations before making a decision~\cite{HuBrKo2011-SEAMS-ResAlloc}. Each time an online QoS query is executed, it is processed based on the online system architecture models (DML instances) provided on demand by the respective system components during operation. Given the wide range of possible contexts in which the online models can be used, automatic model-to-model transformation techniques~(e.g.,~\cite{MeKoKo2011-MASCOTS-PCMtoQPN}) are used to generate tailored prediction models on-the-fly depending on the required accuracy and the time available for the analysis. Multiple prediction model types and model solution techniques are employed here in order to provide flexibility in trading-off between prediction accuracy and analysis overhead.

%Figure~\ref{Fig:OnlinePerfPredProcess} illustrates the process that will be followed in order to provide an answer to a query.\Shorten{ First, the QoS models of all involved system components will be retrieved and combined by means of model composition techniques into a single architecture-level QoS model encapsulating all information relevant to answering the QoS query. This model will then be transformed into a predictive QoS model by means of an automatic \emph{model-to-model transformation}. The target predictive model type and level of abstraction as well as the solution technique will be determined on-the-fly based on the required accuracy and the time available for the analysis. }

 %ORIG-LONG: First, the QoS models of all involved system components will be retrieved and combined by means of model composition techniques into a single architecture-level QoS model encapsulating all information relevant to answering the QoS query. This model will then be transformed into a predictive QoS model by means of an automatic \emph{model-to-model transformation}.\shorten{ Different types of predictive performance models and model solution techniques\shorten{ (both analytical and simulative)} will be supported to provide flexibility in terms of prediction accuracy and analysis overhead. Existing model-to-model transformations for static architecture-level performance models will be used as a basis, e.g.,~\cite{becker2008a,KoRe2008-PCM2LQN}\shorten{PeWo2007-CSM,GrMiSa2007-KLAPER,SmLlCoDiWi2005-XML_based_SPE_with_SPMIF,Be2008-CoupledModelTrans,LoMeCa2004-FromUMLtoSPN,MaBa2004-UML-PSI,Ko2008-ParameterDepend,Ha2008-ConcurrencyModeling,WuWo2004-CBML,BeKoRe2007-PCM}. The latter include transformations to layered queueing networks, stochastic process algebras, queueing Petri nets and specialized simulation models.} The target predictive model type and level of abstraction as well as the solution technique will be determined on-the-fly based on the required accuracy and the time available for the analysis. Multiple model types\shorten{ (layered queueing networks, stochastic process algebras, queueing Petri nets and general-purpose simulation models)} and model solution techniques\shorten{ (exact analytical techniques, numerical approximation techniques, simulation and bounding techniques)} will be used in order to provide flexibility in trading-off between prediction accuracy and analysis overhead.\shorten{ The proposed approach is in line with the increasing trend in the software performance engineering community to combine early-cycle model-based performance analysis approaches with late-cycle measurement-based approaches~\cite{WoFrPe2007-FutureOfSPE}.}%: i) the type of the query (e.g., metrics to be predicted, system size), ii) the required precision of the results, iii) the time constraints, iv) the amount of information available about the system components and services involved.

%Figure~\ref{Fig:ReconfigProcess} depicts the online reconfiguration process and the self-adaptation control loop.
% mention transformations

%As illustrated on the figure below, \emph{Self-Aware Systems Engineering}~[B9,A7,D33] is a newly emerging research area at the intersection of several computer science disciplines including Software and Systems Engineering, Computer Systems Modeling, Autonomic Computing, Distributed Systems, Cluster and Grid Computing, and more recently, Cloud Computing and Green IT. The realization of the vision of self-aware systems as described above calls for an interdisciplinary approach considering not only technical but also business and economical challenges. The resolution of these challenges promises to reduce the costs of ICT and their environmental footprint while keeping a high growth rate of IT services.

%\begin{figure}[!htb]
%  \centering
%  \includegraphics[width=0.55\linewidth]{SelfAwareSystems}
%%  \caption{}
%%  \label{Fig}
%\end{figure}

\section{Outline}
% Structure
The remainder of this technical report is organized as follows. In Chapter~\ref{chap:sota}, we provide an overview on related work concerning performance modeling on the one hand and run-time system reconfiguration and adaptation on the other hand. Chapter~\ref{chap:scenario} introduces a representative online prediction scenario we use throughout the technical report to motivate and evaluate the novel modeling approaches.
The application architecture and resource landscape models, i.e., the system architecture QoS model is described in Chapter~\ref{chap:AppLevelAndResLandscape}. Our approach to modeling system adaptation is presented in Chapter~\ref{chap:SysReconfig}.
The report concludes with a discussion of the differences between \gls{dml} and PCM, and provides an outlook on future work in Chapter~\ref{chap:discussion}.