Maximum Likelihood Estimator of the 85th Percentile

Larry George, Problem Solving Tools, http://www.fieldreliability.com

Abstract

Traffic surveys are supposed to observe unobstructed vehicles under good conditions. How often is traffic unobstructed on busy roads? Traffic actually flows in clusters at the speed of the slowest cars. What is the real 85th percentile of unobstructed vehicles? Suppose vehicles originate from a stop in pairs. After reaching their eventual speed, either the faster vehicle is blocked or both travel at their unobstructed speeds. We obtain the maximum likelihood estimator of the 85th percentile of vehicle speeds for this simple model. This model has surprising generality unless the origin is distant.

Simulation and Problem Formulation

Suppose vehicles originate from a stop in pairs (cohorts of size two). After reaching their eventual speed, either the faster vehicle is blocked or both travel at their unobstructed speeds. Then observations, depending on the equipment could consist of:

·        All speeds

·        All unblocked speeds excluding blocked pairs’ speeds

·        All speeds excluding that of a blocked vehicle

Table 1 shows simulation of speeds and blocking for the alternative methods of data collection. The simulation inputs were: mean speed equals 60 mph and standard deviation of speed equals 5 mph. Simulated speeds were normally distributed. The simulation included 100 pairs.

 

Table 1. Simulated speeds, alternative measurements, and estimated 85th percentiles

Alternatives

All

No blocked pairs

No blocked vehicles

Number observed

200

110

155

85th percentile

62.59

64.23

63.96

 

Assume unobstructed speeds are normally distributed, observations of pairs are close enough to their origin that pairs are independent, vehicles reach their eventual speed unless blocked, and all speeds are observed. Under those assumptions, mean equals 60 mph and standard deviation equals 5 mph, the 85th percentile should be 65.18 mph.

This formulation as pairs seems surprising general as long as measurements are fairly close to the origin. If the origin is distant yet passing is not possible, the formulation of http://www.fieldreliability.com/Nwslt5.doc page 3 [Newell] is more appropriate.

Maximum Likelihood Estimator

The log likelihood function consists of the sum of terms of either

ln[f(s1)]+ln[f(s2)] or ln[f(s)]+ln[1-F(s)]

where f and F are the normal probability density and cumulative distribution functions of unobstructed vehicle speed. The selection of which log-likelihood term to use depends on whether s1 equals s2. The log likelihood ln[f(s1)]+ln[f(s2)] is the usual likelihood of two independent observations. The log likelihood ln[f(s)]+ ln[1-F(s)] is the likelihood of a pair driving at the same speed s, presumably because the faster driver is obstructed by the slower.

Solving for the maximum likelihood estimators of the mean and variance explicitly appears to be difficult. However, maximizing the log likelihood numerically using Excel Solver is easy, http:www.fieldreliability.com/Speed.xls. Spreadsheet “2Cohort” of workbook Speed.xls simulates vehicle speeds, counts observations for the alternatives in the Simulation and Problem Formulation section, and makes naïve estimates of the 85th percentile like those in table 1. Spreadsheet “ObsPair” copies one simulation and computes the maximum likelihood estimates of mean, standard deviation, and 85th percentile. You can enter your own data and use Excel Solver to estimate the 85th percentile.

The standard error of the maximum likelihood estimate is unknown. Table shows an approximation based on the standard error of a normal percentile estimated from a complete random sample [Ferguson].

Results

Table 2 shows the maximum likelihood estimates of the parameters for one simulation in which all 200 vehicle speeds were observed, blocked or unobstructed. The results in tables 1 and 2 were from different simulations, but results don’t vary much. (I’ll try to provide results from the same simulation in a later version of this report.)

 

Table 2. Maximum likelihood estimates

Parameter

Mean

Standard deviation

85th percentile

Standard error

Real

60

5

65.18

0

Estimated

60.52

4.55

65.24

0.0065

 

Recommendations

The apparently frivolous results should encourage traffic engineers to use appropriate measurements and statistics to the fullest extent possible. For example, a local city at the intersection of two major freeways has a problem with cut-through traffic during commuter hours. The city uses city cars to track vehicles that enter the city to see what their destination is or where they exit. It is possible to estimate origin-destination matrixes, travel time distributions, and traffic intensities from all entries to all exits using the entry and exit traffic counts, which the city collects.

The apparently frivolous results should also encourage analysts of censored data to investigate the nature of censoring and to use appropriate statistics. For example, ships (sales, production, installations and so on) and returns (complaints, failures, repairs, spares sales, and so on) counts are statistically sufficient to make nonparametric estimates of age-specific reliability and failure rate functions. If you have interestingly censored data, send it along with the problem statement to pstlarry@yahoo.com, and I’ll try to help analyze it, free of charge. 

References

Ferguson, Thomas S. “Asymptotic joint distribution of sample mean and a sample quantile,” http://www.math.ucla.edu/~tom/papers/unpublished/meanmed.pdf

Newell, Gordon, “A Theory of Platoon Formation in Tunnel Traffic," Ops. Res., pp589-598, 1959