Unit VII: Analysis of Simulation Output - Simulation and Modeling - BCA Notes (Pokhara University)

## Introduction:

The goal in analysing output data from running a simulation model is to make a valid statistical inference about the initial and long-term average behaviour of the system based on the sample average from N replicate simulation runs.

Output Analysis is the analysis of data generated by a simulation run to predict system performance or compare the performance of two or more system designs. In stochastic simulations, multiple runs are always necessary. The output of a single run can be viewed as a sample of size 1.

Output analysis is needed because output data from a simulation exhibits random variability when random number generators are used. i.e., two different random number streams will produce two sets of output which (probably) will differ. The statistical tool mainly used is the confidence interval for the mean.

For most simulations, the output data are correlated and the processes are non-stationary. The statistical (output) analysis determines:
a. The estimate of the mean and variance of random variables.
b. The number of observations required to achieve a desired precision of these estimates.

## Nature of the Problem:

Once a stochastic variable has been introduced into a simulation model, almost all the system variables describing the system behaviour also become stochastic. The values of most of the system variables will fluctuate as the simulation proceeds so that no one measurement can be taken to represent the values of a variable.

Instead many observations of the variable values must be made in order to make a statistical estimate of its true values. Some statement must also be made about the probability of the true value falling within the given interval about the estimated value. Such a statement defines a confidence interval, without it simulation result are of little value to the system analyst.

A large body statistical method has been developing over the years to analyse results in science, engineering and other fields where experimental observation is made. So, because of the experimental measurements of the system of simulation for these statistical methods can be adapted to simulation results to analyse.

The Newly Developing Statistical Methodology Concerns:

1. To ensure that the statistical estimates are consistent, meaning that as the sample size increases the estimate tends to a true value.

2. To control biasing in measure of both new values of variance. Bias causes the distinction of an estimate to differ significantly from the true population statistics, even though the estimate may be consistent.

3. To develop sequential testing methods, to determine how long a simulation should be run in order to obtain confidence in its return.

## Estimation Method:

Statistical methods are commonly used on the random variable. Usually, a random variable is drawn from an infinite population with a finite mean ‘μ’ and finite variance ‘σ2’. These random variables are independently and identically distributed (i.e. IID variables).

Let, xi=iid random variables. (i = 1, 2…, n), then according to central limit theorem and applying transformation, approximate normal variance, a. It can be shown to be a consistent estimator for the mean of the population from which the sample is done.

b. Since the sample mean is some of the random variables, it is itself a random variable. So, a confidence interval about its computed value needs to be established.

c. The probability density function on the standard normal variable (Z) is shown in the figure below. ## Simulation Run Statistics:

On every simulation run, some statistic is measure based on some assumption; for example: on establishing confidence interval it is assumed that the observation is mutually independent and distinction from which they are drawn is stationary. But many statistics are interesting in simulation don’t meet this condition.

Let us illustrate the problems that arise in measuring statistic from a simulation run with the example of a single server system.

Consider the occurrence of arrivals has a Poisson distribution:
a. The service time has an exponential distribution.
b. The queuing discipline is FIFO
c. The inter-arrival time is distributed exponentially
d. System has a single server.

Then in a simulation run, the simplest way to estimate the mean waiting is to accumulate the waiting time of n successive entities and dividing it by ‘n’. this gives sample mean denoted by x ̅  (n). The 2nd problem is that the distribution may not be stationary; it is because a simulation run is started with the system in some initial state, frequently the idle state, in which no service is being given and no entities are waiting, thus the early arrivals have a more probability of obtaining service quickly. So, a sample means that includes the early arrivals will be biased. As the length of the simulation run extended and the sample size increases, the effect of bias will be minimum. This is shown in the figure below.

## Replication of Runs:

One problem in measuring the statistic in the simulation run is that the results are dependent. But it is required, in simulation, to get the independent result. The one way of obtaining the independent result is to repeat the simulation.

Repeating the experiment with different random numbers for the same sample size ‘n’ gives a set of an independent determination of sample mean x ̅(n).

Even though the distribution of the sample means depends upon the degree of autocorrelation, this independent determination of sample mean can be used to estimate the variance of the distribution. Here, the value of x ̅ is an estimate for mean waiting time and the value of s^2 can be used to establish the confidence of intervals.

## Elimination of Initial Bias:

There two general approaches that can be used to remove the initial bias:
1. The system can be started in more representative states rather than in the empty state.
2. The first part of the simulation run can be ignored.
In the first approach, it is necessary to know the steady-state distinction for the system and we then select the initial state distinction. In the study of simulation, particularly the existing system, there may be information available on the expected condition which makes it feasible to select a better initial condition and thus eliminating the initial bias.

The second approach that is used to remove the initial bias is the most common approach. In this method, the initial section of the run which has a high bias (simulation) result is eliminated. First, the run is started from an idle state and stopped after a certain period of time (the time at which the bias is satisfactory). The entities existing in the system at that are left as they are and this point is the point of a restart for other repeating simulation runs.

Then the run is restarted with statistics being gathered from the point of the restart. These approaches have the following difficulties:

1. No simple rules can be given to deciding how long an interval should be eliminated. For this, we have to use some pilot run starting from the ideal state to judge how long the initial bias remains. These can be done by plotting the measured, statistics against the run length.

2. Another disadvantage of eliminating the first part of the simulation run is that the estimate of variance will be based on less information affecting the establishment of confidence limit. These will then cause to increase in confidence internal size.