Bootstrap Estimates of Confidence Intervals (2024)

Bootstrapping is a statistical procedure that utilizes resampling (with replacement) of a sample to infer properties of a wider population.

More often than not, we want to understand the properties of a population but we only have access to a small sample of that population. Sometimes, we are unable to gather more data because it is too expensive, too time consuming, or just not possible. When this is the situation, we must use the sample we already have in a clever way to learn about the characteristics of the population we are interested in. In comes the bootstrap method! The whole idea of bootstrapping is to randomly resample (with replacement) our existing sample so we in effect have more "samples" to work with. These resamples can be used to estimate confidence intervals (which will be the focus of this blog post), reduce biases, perform hypothesis tests, and more. With bootstrapping, we are quite literally pulling our data up by its bootstraps. Let's take a look at how it works.

How Do We Use the Bootstrap Method to Estimate a Confidence Interval?

Take n repeated random samples, with replacement, from the given dataset. These are called "resamples" and should be the same size as the original sample.
Calculate the statistic of interest for each of these resamples, e.g. mean, median, standard deviation, etc.
Now you that you have a distribution of n different estimates of the statistic of interest, you can calculate the confidence interval on that statistic to determine its variability.

You might be wondering why exactly this works and how this method allows us to understand the properties of the population. Basically, bootstrapping works by treating the distribution from the resamples as a reasonable approximation of the true probability distribution. Also, the variability of statistics calculated from the original sample is approximated well by that of each resample.

The basic idea is that inferences made from the resampled data is a good proxy for inferences about the population itself. Check out Bradley Efron's paper if you are interested in diving into this reasoning deeper.

Worked Example with Python

In our Python example we will use data from the Hubble Space Telescope. This data contains distances and velocities of 24 galaxies containing Cepheid stars, from the Hubble space telescope key project to measure the Hubble Constant.

The data contains three columns:

Galaxy: A factor label identifying the galaxy
y: The galaxy’s relative velocity measured in kilometers/second (km/s)
x: The galaxy’s distance from Earth measured in Megaparsecs (Mpc) (Note: 1 Mpc = 3.09e13 km)

We can use this data to estimate the Hubble Constant, \(\beta\), and the age of the universe, \(\beta^{-1}\), with the following:

\[\begin{align} y = \beta x \end{align}\]

Here I’ll give some quick scientific context as to what the Hubble Constant is and how it can be used to estimate the age of the universe.

According to the standard Big Bang model, the universe is expanding uniformly according to Hubble’s Law:

\[\begin{align} v = H_0 d \end{align}\]

where \(v\) is apparent velocity of the galaxy and \(d\) is the distance to the galaxy. \(v\) and \(d\) are related linearly by \(H_0\), which we call the Hubble Constant. These variables, \(v, d, H_0\), are the standard astrophysical notations for velocity, distance, and the Hubble Constant. In terms of the variables given in our dataset, Hubble’s Law is:

\[\begin{align} y = \beta x \end{align}\]

where \(y\) is the relative velocity of the galaxy, \(\beta\) is the Hubble Constant, and \(x\) is the distance to the galaxy. From now on, I’ll use \(y, x,\) and \(\beta\) to denote galactic velocity, distance, and the Hubble Constant.

See Also

Linear Regression With Bootstrapping

# Extract velocity, y, and distance, x, from our imported datay = data["y"]x = data["x"] # Plot x vs y.plt.scatter(x,y)plt.title("Galactic Distance vs Relative Velocity")plt.xlabel("Distance (Mpc)")plt.ylabel("Relative Velocity (km/s)")plt.show()

We can see that the relationship between relative velocity and distance is roughly linear.

Now our goal is to bootstrap this data to estimate a 95% confidence interval on the Hubble Constant. Recall bootstrapping requires that we resample the data many times so that we get a distribution of a particular statistic, in this case the Hubble Constant, that we can use to estimate a confidence interval on that statistic. Since we will resample the data many many times, let’s define a function that creates a resample of the data that we can call later in a loop:

def resample(data, seed): ''' Creates a resample of the provided data that is the same length as the provided data ''' import random random.seed(seed) res = random.choices(data, k=len(data)) return res

Our resample() function takes in data that it will resample from. It also takes in seed that will set a psuedorandom seed; this is purely for reproducibility of this example.

Now let’s set up our data so that we can feed it into our resample() function. Our data contains velocity-distance pairs: specific velocities correspond to specific distances. So, we want to randomly resample pairs of velocity and distance, we don’t want to randomly sample velocity then randomly sample distance separately. Let’s use Python’s zip() function to “zip” our corresponding velocities and distances together, then resample the “zipped” pairs. This will ensure that we maintain the correct velocity-distance pairs throughout our bootstrap analysis.

# Extract the distance, x, and velocity, y, values from our pandas dataframe distances = data["x"].valuesvelocities = data["y"].values# Zip our distances and velocities together and store the zipped pairs as a listdist_vel_pairs = list(zip(distances, velocities))# Print out the first 5 zipped distance-velocity pairsprint(dist_vel_pairs[:5])

[(2.0, 133), (9.16, 664), (16.14, 1794), (17.95, 1594), (21.88, 1473)]

In the above output, we can see the first 5 “zipped” distance-velocity pairs. Each pair is a tuple containing the distance in index 0, and the corresponding velocity in index 1. Now let’s generate 10,000 resamples of distance-velocity pairs using our resample() function in a list comprehension. After generating the 10,000 resamples, let’s use a for loop to perform a linear regression on each of them to get a distribution of 10,000 Hubble Constant, \(\beta\), estimates. Let’s use the LinearRegression() function from the sklearn.linear_model module to perform our linear regressions. In the argument of LinearRegression() we set fit_intercept=False so the regression does not fit an intercept coefficient. This is because there is no intercept in Hubble’s Law.

# Generate 10,000 resamples with a list comprehensionboot_resamples = [resample(dist_vel_pairs, val) for val in range(10000)]# Calculate beta from linear regression for each of the 10,000 resamples and store them in a list called "betas"betas = []for res in boot_resamples: # "Unzip" the resampled pairs to separate x and y so we can use them in the LinearRegression() function dist_unzipped, vel_unzipped = zip(*res) dist_unzipped = np.array(dist_unzipped).reshape((-1, 1)) # Find linear coefficient beta for this resample and append it to a list of betas betas.append(LinearRegression(fit_intercept=False).fit(dist_unzipped, vel_unzipped).coef_[0])# Print out the first 5 beta values print(betas[:5])

[70.49289924780366, 86.37984957925575, 75.39193217270235, 78.0888441398601, 75.35740068419938]

This may take a minute to run because we are performing 10,000 linear regressions. At the end I printed out the first 5 estimates of the Hubble Constant just so we can see what some of them look like. We do see some variability in the values! Let’s now take a look at the distribution of Hubble Constants we found with a histogram.

# Distribution of betas (hubble constants). plt.clf()plt.hist(betas, bins=50)plt.title("Distribution of the Hubble Constant")plt.show()

Now that we have many possible values for the Hubble Constant, \(\beta\), we can calculate the 95% confidence interval on this distribution. This will serve as an approximate confidence interval on the true value of the Hubble Constant. Let’s use the numpy.percentile() function to calculate our confidence interval.

# Calculate the values of 2.5th and 97.5th percentiles of our distribution of betasconf_interval = np.percentile(betas, [2.5,97.5])print(conf_interval)

[66.86548795 86.30720865]

We find the boundaries of our 95% confidence interval are about 66.9 and about 86.3. This informs the uncertainty in our estimate. The process of sampling data and calculating 95% confidence intervals captures the true value we’re trying to estimate about 95% of the time. In this case we’re confident the true time is between 66.9 and 86.3 seconds\(^{-1}\).

Let’s replot our histogram with the confidence interval marked by vertical lines:

plt.clf()plt.hist(betas, bins=50);plt.title("Distribution of the Hubble Constant with 95% CI Indicated")plt.axvline(conf_interval[0], color="red")plt.axvline(conf_interval[1], color="red")plt.show()

Now let’s take a look at the age of the universe from our estimates. The universe is currently estimated to be about 13.8 billion years old, let’s see how well our estimates match up to this value. When doing our calculations, we must keep our units consistent, so let’s convert megaparsecs (Mpc) to kilometers (km) and seconds (s) to years (yr).

The conversion from Mpc to km:

\[\begin{align} 1 \ \text{Mpc} = 3.09\times 10^{19} \ \text{km} \end{align}\]

The conversion from s to yr:

\[\begin{align} 1 \ \text{s} = 1 \ \text{year} * \frac{365 \ \text{days}}{1 \ \text{year }} * \frac{24 \ \text{hours}}{1 \ \text{day}} * \frac{60 \ \text{minutes}}{1 \ \text{hour}} * \frac{60 \ \text{seconds}}{1 \ \text{minute}} \end{align}\]

\[\begin{align} 1 \ \text{s} = 1 \ \text{year} * 365 * 24 * 60^2 \end{align}\]

Let’s use these conversion factors to convert our Hubble Constant confidence interval to a confidence interval on the age of the universe:

# Calulation of 95% confidence interval for the age of the universeconf_interval_age = 3.09e19/(conf_interval * (365*24*60**2))conf_interval_age

array([1.46537863e+10, 1.13528474e+10])

From this calculation, we’re 95% confident the age of the universe is between about 11.4 billion and 14.8 billion years old. That’s spot on with the current estimation of the age of the universe, 13.8 billion years! Very cool.

Summary of What We Did

We used the bootstrap method to randomly resample (with replacement) our 24 galactic relative velocity and distance datapoints 10,000 times, estimate the Hubble Constant by performing a linear regression for each of those resamples to get a distribution of values, and calculate a 95% confidence interval on the distribution of the Hubble Constant. We then used this confidence interval to calculate the confidence interval on the age of the universe. These confidence intervals serve as good proxies for that of the true Hubble Constant/age of the universe if we could calculate these values using relative velocity and distance data from the entire population of galaxies in the universe.

References

B. Efron. (1979). “Bootstrap Methods: Another Look at the Jackknife.” The Annals of Statistics, 7(1) 1-26. https://doi.org/10.1214/aos/1176344552
Freedman, W. L., Madore, B. F., Gibson, B. K., Ferrarese, L., Kelson, D. D., Sakai, S., Mould, J. R., Kennicutt, Jr., R. C., Ford, H. C., Graham, J. A., Huchra, J. P., Hughes, S. M., Illingworth, G. D., Macri, L. M., & Stetson, P. B. (2001). Final Results from the Hubble Space Telescope Key Project to Measure the Hubble Constant. The Astrophysical Journal, 553(1), 47–72. https://doi.org/10.1086/320638
Hubble Space Telescope Data. R. (n.d.). Retrieved from https://search.r-project.org/CRAN/refmans/gamair/html/hubble.html
Wikimedia Foundation. (2023, March 27). Bootstrapping (statistics). Wikipedia. Retrieved from https://en.wikipedia.org/wiki/Bootstrapping_(statistics)

Samantha Lomuscio
StatLab Associate
University of Virginia Library
March 28, 2023

Bootstrap Estimates of Confidence Intervals (2024)

FAQs

What is bootstrapping for confidence intervals? ›

Bootstrapping is a statistical procedure that utilizes resampling (with replacement) of a sample to infer properties of a wider population.

Read On ›

How many bootstraps is enough? ›

Incidentally, the mean of the 1000 bootstrap samples was 0.0802 (compared to the sample mean of 0.0824). B = 1000 is usually adequate for the estimation of the sampling variance or standard deviation; however, good estimates of confidence intervals often require B = 5000 or more.

Discover More Details ›

What is the 95 percentile confidence interval for bootstrap? ›

For example, a 95% percentile bootstrap CI with 1,000 bootstrap samples is the interval between the 25th quantile value and the 975th quantile value of the 1,000 bootstrap parameter estimates.

How many bootstrap replicates are necessary for confidence intervals? ›

In terms of the number of replications, there is no fixed answer such as “250” or “1,000” to the question. The right answer is that you should choose an infinite number of replications because, at a formal level, that is what the bootstrap requires.

See Details ›

What is a good bootstrapping value? ›

If we recover the same node through 95 of 100 iterations of taking out one character and resampling of our tree, then we have a good idea that the node is well supported with a bootstrap value of 95%. Bootstrap values less than 50% are not taken into account for tree construction.

Find Out More ›

What does bootstrapping estimate mean? ›

Bootstrapping is a procedure for estimating the distribution of an estimator by resampling (often with replacement) one's data or a model estimated from the data. Bootstrapping assigns measures of accuracy (bias, variance, confidence intervals, prediction error, etc.) to sample estimates.

Tell Me More ›

What is the rule of thumb for bootstrap? ›

A rule of thumb for the number of resamples needed for a reasonable bootstrap distribution is 10,000, however for the use of this class, use 500. Too few bootstrap samples can create problems for getting a good bootstrap sampling distribution.

Show Me More ›

How to decide the number of bootstraps? ›

In order to avoid a power loss of more than, say, 1%, it is necessary to use a rather large number of bootstrap samples. If our simulation results can be relied upon, B = 399 would seem to be about the minimum for a test at the . 05 level, and B = 1499 for a test at the . 01 level.

Explore More ›

What is the minimum sample for bootstrapping? ›

For the bootstrap samples the usual recommendation is the use at least 500 (for an initial screening, because it usually gives already quite good results) and 5,000 for the final model. Generally, the more the better (i.e., the more precise are your estimates of the standard error and the confidence intervals).

What is a good confidence interval at 95%? ›

Analysts often use confidence intervals that contain either 95% or 99% of expected observations. Thus, if a point estimate is generated from a statistical model of 10.00 with a 95% confidence interval of 9.50 to 10.50, it means one is 95% confident that the true value falls within that range.

Show Me More ›

Is 80% confidence interval wider than 95%? ›

The confidence interval with a confidence level of 95% will be wider than that of 80% because the margin of error will be greater and with a wider confidence level, the interval is more imprecise.

Read The Full Story ›

What are bias-corrected bootstrap confidence intervals? ›

The bias-corrected bootstrap confidence interval (BCBCI) was once the method of choice for conducting inference on the indirect effect in mediation analysis due to its high power in small samples, but now it is criticized by methodologists for its inflated type I error rates.

See Details ›

What does a bootstrap confidence interval tell you? ›

The spread in these bootstrap estimates tells us (approximately) how large is the effect of chance error in the original sample upon the variation in the estimateˆθ. The approximation improves as n increases. Suppose we want to set a 95% confidence interval on θ, the true parameter value for the real population f.

Get More Info Here ›

How will you determine how many replicates are enough? ›

One way to estimate the number of replicates you need for your experiment is to use a power analysis. A power analysis is a statistical tool that calculates the probability of detecting a significant effect given a certain sample size, effect size, and significance level.

What is the standard error of bootstrapping statistics? ›

We use each distribution to estimate certain things about the corresponding sampling distribution, including: standard error: the bootstrap standard error is the sample standard deviation of the bootstrap distribution, s b = 1 / ( r - 1 ) ∑ i = 1 r ( θ ^ i * - θ ^ * ‾ ) 2 .

What is the purpose of bootstrapping? ›

Particularly useful for assessing the quality of a machine learning model, bootstrapping is a method of inferring results for a population from results found on a collection of smaller random samples of the population, using replacement during the sampling process.

View Details ›

What is bootstrapping for dummies? ›

In statistics and econometrics, bootstrapping has come to mean to resample repeatedly and randomly from an original, initial sample using each bootstrapped sample to compute a statistic.

What is considered bootstrapping? ›

Bootstrapping is the process of founding and running a company using only personal finances or operating revenue. It is a form of financing that allows the entrepreneur to maintain more control even though it can increase financial strain.

Learn More ›

What does bootstrapping do in regression? ›

Bootstrapping uses the sample data to estimate relevant characteristics of the population. The sampling distribution of a statistic is then constructed empirically by resampling from the sample. The resampling procedure is designed to parallel the process by which sample observations were drawn from the population.

Discover More Details ›