Loss Data Analytics

4.1 Nonparametric Inference

In this section, you learn how to:

Estimate moments, quantiles, and distributions without reference to a parametric distribution
Summarize the data graphically without reference to a parametric distribution
Determine measures that summarize deviations of a parametric from a nonparametric fit
Use nonparametric estimators to approximate parameters that can be used to start a parametric estimation procedure

4.1.1 Nonparametric Estimation

In Section 2.2 for frequency and Section 3.1 for severity, we learned how to summarize a distribution by computing means, variances, quantiles/percentiles, and so on. To approximate these summary measures using a dataset, one strategy is to:

assume a parametric form for a distribution, such as a negative binomial for frequency or a gamma distribution for severity,
estimate the parameters of that distribution, and then
use the distribution with the estimated parameters to calculate the desired summary measure.

This is the parametric approach. Another strategy is to estimate the desired summary measure directly from the observations without reference to a parametric model. Not surprisingly, this is known as the nonparametric An approach to inference that does not rely on references to a parametric model. approach.

Let us start by considering the most basic type of sampling scheme and assume that observations are realizations from a set of random variables $X_1, \ldots, X_n$ that are iid identically and independently distributed draws from an unknown population distribution $F(\cdot)$. An equivalent way of saying this is that $X_1, \ldots, X_n$, is a random sample (with replacement) from $F(\cdot)$. To see how this works, we now describe nonparametric estimators of many important measures that summarize a distribution.

4.1.1.1 Moment Estimators

We learned how to define moments in Section 2.2.2 for frequency and Section 3.1.1 for severity. In particular, the $k$-th moment, $\mathrm{E~}[X^k] = \mu^{\prime}_k$, summarizes many aspects of the distribution for different choices of k. Here, $\mu^{\prime}_k$ is sometimes called the kth population moment to distinguish it from the kth sample moment, \[ \frac{1}{n} \sum_{i=1}^n X_i^k , \] which is the corresponding nonparametric estimator. In typical applications, $k$ is a positive integer, although it need not be.

An important special case is the first moment where k=1. In this case, the prime symbol ($\prime$) and the $1$ subscript are usually dropped and one uses $\mu=\mu^{\prime}_1$ to denote the population mean, or simply the mean. The corresponding sample estimator for $\mu$ is called the sample mean, denoted with a bar on top of the random variable: \[ \bar{X} =\frac{1}{n} \sum_{i=1}^n X_i . \] Another type of summary measure of interest is the the $k$-th central moment, $\mathrm{E~} [(X-\mu)^k] = \mu_k$. (Sometimes, $\mu^{\prime}_k$ is called the $k$-th raw moment to distinguish it from the central moment $\mu_k$.). A nonparametric, or sample, estimator of $\mu_k$ is \[ \frac{1}{n} \sum_{i=1}^n \left(X_i - \bar{X}\right)^k . \] The second central moment ($k=2$) is an important case for which we typically assign a new symbol, $\sigma^2 = \mathrm{E~} [(X-\mu)^2]$, known as the variance. Properties of sample moment estimator of the variance such as $n^{-1}\sum_{i=1}^n \left(X_i - \bar{X}\right)^2$ have been studied extensively and so it is natural that many variations have been proposed. The most widely used variation is one where the effective sample size is reduced by one, and so we define \[ s^2 = \frac{1}{n-1} \sum_{i=1}^n \left(X_i - \bar{X}\right)^2. \] Here, the statistic $s^2$ known as the sample variance. Dividing by n-1 instead of n matters little when you have a sample size n in the thousands as is common in insurance applications. Still, the resulting estimator is unbiased in the sense that $\mathrm{E~} s^2 = \sigma^2$, a desirable property particularly when interpreting results of an analysis.

4.1.1.2 Empirical Distribution Function

We have seen how to compute nonparametric estimators of the kth moment $\mathrm{E~} X^k$. In the same way, for any known function $\mathrm{g}(\cdot)$, we can estimate $\mathrm{E~} \mathrm{g}(X)$ using $n^{-1}\sum_{i=1}^n \mathrm{g}(X_i)$. This is sometimes known as the analog principle.

Now suppose that we fix a value of x and consider the function $\mathrm{g}(X) = I(X \le x)$. Here, the notation $I(\cdot)$ is the indicator function; it returns 1 if the event $(\cdot)$ is true and 0 otherwise. For this choice of $\mathrm{g}(\cdot)$, the expected value is $\mathrm{E~} I(X \le x) = \Pr(X \le x) = F(x)$, the distribution function evaluated at a fixed point x. Using the analog principle, we define the nonparametric estimator of the distribution function \[ \begin{aligned} F_n(x) &= \frac{1}{n} \sum_{i=1}^n I\left(X_i \le x\right) \\ &= \frac{\text{number of observations less than or equal to }x}{n} . \end{aligned} \] As a nonparametric estimator, $F_n(\cdot)$ is based on only observations and does not assume a parametric family for the distribution, it is also known as the empirical distribution function.

Example 4.1.1. Toy Data Set. To illustrate, consider a fictitious, or “toy,” data set of $n=10$ observations. Determine the empirical distribution function.

\[ {\small \begin{array}{c|cccccccccc} \hline i &1&2&3&4&5&6&7&8&9&10 \\ X_i& 10 &15 &15 &15 &20 &23 &23 &23 &23 &30\\ \hline \end{array} } \]

Limitation Type	Limited Variable	Recording Information
right censoring	\(X_U^{\ast}= \min(X, C_U)\)	\(\delta_U= I(X \leq C_U)\)
left censoring	\(X_L^{\ast}= \max(X, C_L)\)	\(\delta_L= I(X \geq C_L)\)
interval censoring
right truncation	\(X\)	observe \(X\) if \(X \leq C_U\)
left truncation	\(X\)	observe \(X\) if \(X \geq C_L\)

Variable	\(i\)	1	2	3	4	5	Sum
Loss	\(y_i\)	5	5	5	4	6	25
Premium	\(P(\mathbf{x}_i)\)	4	2	6	5	8	25
Relativity	\(R(\mathbf{x}_i)\)	5	4	3	2	1

Chapter 4 Model Selection and Estimation

4.1 Nonparametric Inference

4.1.1 Nonparametric Estimation

4.1.1.1 Moment Estimators

4.1.1.2 Empirical Distribution Function

4.1.1.3 Quartiles, Percentiles and Quantiles

4.1.1.4 Density Estimators

4.1.2 Tools for Model Selection and Diagnostics

4.1.2.1 Graphical Comparison of Distributions

4.1.2.2 Statistical Comparison of Distributions

4.1.3 Starting Values

4.1.3.1 Method of Moments

4.1.3.2 Percentile Matching

4.2 Model Selection

4.2.1 Iterative Model Selection

4.2.2 Model Selection Based on a Training Dataset

4.2.3 Model Selection Based on a Test Dataset

4.2.4 Model Selection Based on Cross-Validation

4.3 Estimation using Modified Data

4.3.1 Parametric Estimation using Modified Data

4.3.1.1 Parametric Estimation using Grouped Data

4.3.1.2 Censored Data

4.3.1.3 Truncated Data

4.3.1.4 Parametric Estimation using Censored and Truncated Data

4.3.2 Nonparametric Estimation using Modified Data

4.3.2.1 Grouped Data

4.3.2.2 Right-Censored Empirical Distribution Function

4.3.2.3 Right-Censored, Left-Truncated Empirical Distribution Function

4.4 Bayesian Inference

4.4.1 Introduction to Bayesian Inference

4.4.2 Bayesian Model

4.4.3 Bayesian Inference

4.4.3.1 Summarizing the Posterior Distribution of Parameters

4.4.3.2 Bayesian Predictive Distribution

4.4.4 Conjugate Distributions

4.5 Further Resources and Contributors

Exercises

Contributors

Technical Supplement A. Gini Statistic

TS A.1. The Classic Lorenz Curve

TS A.2. Ordered Lorenz Curve and the Gini Index

Ordered Lorenz Curve

Gini Index

TS A.3. Out-of-Sample Validation

Discussion

Bibliography