[1] -0.08281164 1.93468099 -2.05128979 0.27773897
Simulation
The bootstrap was introduced in 1979 by Efron (1979).
http://statweb.stanford.edu/~ckirby/brad/
Boostrap methods are a class of nonparametric Monte Carlo methods that estimate the distribution of a population by resampling.
Treat the sample as if it were the population
What it is good for:
Read the paper available in this url: https://garstats.wordpress.com/2016/05/27/the-percentile-bootstrap/
Comparing the distribution of \(\bar{X}\):
[1] -0.08281164 1.93468099 -2.05128979 0.27773897
[1] -0.01692824
[1] -0.005497356
[1] 0.1878562
[1] 0.1814172
The theoretical mean for \(\bar{X}\) is 0 and the theoretical standard deviation for \(\bar{X}\) is \(1/\sqrt{30}=0.1825742\)
Looking at mpg variable, we want to calculate the proportion of those cars that have fuel efficiency between 14 and 21 mpg.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
[1] 32 11
Looking at mpg variable, we want to calculate the proportion of those cars that have fuel efficiency between 14 and 21 mpg.
We create the operator function %entre% to check if some value x is within y[1] and y[2].
Looking at mpg variable, we want to calculate the proportion of those cars that have fuel efficiency between 14 and 21 mpg.
What is the proportion of those cars that have fuel efficiency between 14 and 21 mpg?
Using the observed sample.
Using boostrap.
The jackknife technique was developed by Maurice Quenouille (1924-1973).
The jackknife is like a “leave-one-out” type of cross-validation.
Let \(x=(x_1, \ldots, x_n)\) be an observed random sample, and define the \(i^{th}\) jackknife sample \(x_{(i)}\) to be the subset of \(x\) that leaves out the \(i^{th}\) observation \(x_i\). That is,
\[x_{(i)}=(x_1, \ldots, x_{i-1},x_{i+1}, \ldots, x_n)\]
If \(\hat{\theta}=T_n(x)\), define the \(i^{th}\) jackknife replicate \(\hat{\theta}_{(i)}=T_n(x_{(i)})\), \(i=1,2, \ldots, n\).
If \(\hat{\theta}\) is a smooth statistic, then \(\hat{\theta}_{(i)}=t(F_{n-1}(x_{(i)}))\) and the jackknife estimate of bias is
\[\widehat{bias}_{jack}=(n-1)(\overline{\hat{\theta}_{(\cdot)}} - \hat{\theta}),\]
where \(\overline{\hat{\theta}_{(i)}}=\frac{1}{n}\sum_{i=1}^{n}\hat{\theta}_{(i)}\) is the mean of the estimates from the leave-one-out samples, and \(\hat{\theta}\) is the estimated computed from the original observed sample.
A jackknife estimate of standard error is
\[\widehat{se}_{jack}=\sqrt{\frac{n-1}{n} \sum_{i=1}^{n} \left( \hat{\theta}_{(i)} - \overline{\hat{\theta}_{(\cdot)}} \right)^2 }\] for a smooth statistic \(\hat{\theta}\).
If the parameter to be estimated is the population mean of \(X\) by using the observed random sample \(x=(x_1, \ldots, x_n)\), we compute the mean \(\bar{x}_{(i)}\) without the \(i\)-th data point:
\[ \bar{x}_{(i)}=\frac{1}{n-1}\sum_{j=1, j\neq i}^{n}x_{j},\quad \quad i=1,\dots ,n.\]
These \(n\) estimates form an estimate of the distribution of the sample statistic if it were computed over a large number of samples. In particular, the mean of this sampling distribution is the average of these \(n\) estimates:
\[ \bar{x}=\frac{1}{n} \sum_{i=1}^{n} \bar{x}_{i} \]
Using the data from example 1.

