Alfred A. Brooks 100 Wiltshire Drive Oak Ridge, TN 37830-4505 Phone (& fax): 423 482-1559 E-mail: brooks50@comcast.net Risk Distribution Functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Alfred A. Brooks - 8/8/94 NOTA BENE: The implementation section has been added to establish feasibility of the method, not to assess the risk on EFPC. The program determines the RGO for a group of mercury species in the soil ingestion pathway and a remediation simulation based on EFPC. The data used in the examples was appropriate to "proof of principle" and does not reflect the true EFPC conditions. Use of "central values" for EFPC resulted in an always safe condition which precluded testing the proposed method. While this method has been reviewed by SAIC and ATSDR, it must be considered as a tentative approach until wider discussion has taken place. Introduction ~~~~~~~~~~~~ The use of unbiased, central-values of risk and risk distribution functions has been proposed by the author to estimate the effects of contamination and benefits of proposed remediation. Fundamental to this approach is that in the real world of constrained resources and optimal strategies, any departure from the unbiased parameters will preclude the attainment of any optimal remediation. Therefore the traditional safety factors will not be applied to the input parameters but the unbiased risk distribution function will be used to calculate an acceptable percentile of the population subject to risk after any required remediation. This paper further defines and describes these concepts. The hazard index or hazard quotient has been use as an example of the method, not because it is the method of choice but that it has been used on EFPC as a risk criteria and parameters are available. Hazard quotients for dissimilar substances which produce NOAELs and RfDs for different effects are not necessarily comparable, e.g., a skin rash is not as serious as renal failure. A common unit of risk such as a reduction of life expectancy would be more meaningful. A hazard variable, e.g., an index or quotient, is computed from several independent variables each of which has a related distribution function expressing either an observational variation (uncertainty) or variability due to the real differences in the members of the population subject to the hazard. (In this treatment, a "point value" is simply a variable whose probability is unity at its "point value" and zero elsewhere.) As a result of these variations in the independent variables, there will be a distribution function for the hazard variable which expresses the fraction of the population subject to a risk for any set of dependent variables. In considering a statistical approach to understanding risk, it must be remembered that the method sheds light on the process only in that it considers the random nature of either population variation or observational error; it does NOT shed any light on a non-random bias of a central value. When an uncertainty contains not only modest random observational errors (sampling as well as measurement) but also a large bias, a statistical treatment may yield a deceptively reassuring result. When substituting means for a distributed variable the following should be considered. The distribution functions of the independent variables may be symmetrical or skewed. If symmetrical (and uni-modal), the mean and modal value (at the maximum probability) are identical and constitute a "central" value. For non-symmetrical distributions, these values differ and for some common distributions, e.g., the exponential distribution, differ greatly. Therefore the term "central" value should be taken as the "mean" value although the modal value with a correction factor may be useful. The modal value of the exponential distribution is far from the mean value. Multi-modal distributions are a problem as the mean may be inappropriate. Further, even if for all the independent variables the mean value is also the maximal value, there is no certainty that this holds for the hazard distribution, i.e., a mean reciprocal. Derivation of Equations ~~~~~~~~~~~~~~~~~~~~ By convention, the demarcation value of a hazard variable is defined as one, i.e., that value at which a "typical" or mean individual would be borderline safe and below which they would be increasingly safe in terms of an LOAEL concept where all parameters are unbiased, i.e., determined without safety factors such that 50 percent of the exposed population would experience the observed "minimal" effect if fed the "reference" dosage. To properly protect an exposed population, the stochastic model including site-specific pathways would be used to determine the site- specific contaminant level at which some high percentile, e.g., 95-th or 99-th of the population would be protected from the minimal adverse effect. We define the hazard distribution function, f[x](c,x)dh which is the fraction of the population exposed to a hazard in the range of h to h+dh where h is the hazard variable, c is the contaminant variable, and x is a vector of all other independent variables. Both c and xi may have associated distribution functions and the xis are typically pathway or model variables. The distribution function, f[x](c,x)dh, will be derived from contributing distribution functions by integration over those distributions, i.e., INT f(c,x,y) dydh where y is a vector of independent variables all y associated with the contributing distribution functions. The integration over c has not been carried out and the equation is applicable to a "point value" of c. Thus, if any of the contributing independent variables are involved in a distribution function, the implied integrations have been carried out and the variable vector contains only the limits of the integration or, most probably, these variables except for a possible fixed factor will have disappeared from the function. We define the fraction of the exposed population NOT subject to hazard as h=1 F(,x) = INT f[h](c,x) dh and its inverse, c(F,x). Eq. 1 h=0 as the point value of the contaminant variable, c, at which the fraction, F of the exposed population is NOT subject to hazard. A Remediation Goal Option may be obtained by setting Equation 1 equal to a desired percentile and solving for the soil concentration. It assumes the protected population is sampling from a uniformly contaminated soil at a concentration of c. We may also calculate the mean hazard index experienced by an exposed population as the integral from h=0 to h=infinity, H(c,x) = INT h(c,x)f[h](c,x) dh where h(c,x) is the hazard index Eq. 2 h=0 The equations have been derived for a single contaminant but are immediately applicable to scenarios in which several contaminants are in fixed proportion to each other. The equation for F, is easily expanded to include multiple contaminants. The inverse form is also possible but contains the concentration of all other contaminants as independent variables. For the case of a distribution of soil concentrations, we may define the hazard distribution function as: c=cu f(h](cu,x)dh = INT f[h](c,x) dc dh Eq.3 c=0 where cu is the upper limit of c. We calculate the effect of the entire soil contaminant distribution by setting cu equal to infinity and calculate the effect of proposed remediation by setting cu equal to the proposed remediation level in which case the distribution function is a function of cu. The same risk integrations can be carried out as with Eq. 1 and 2. The RGO defined in this manner assumes that the protected population is sampling at random from the entire contaminant distribution remaining after remediation is effected. It is not as protective of individuals repeatedly exposed to "hot spots" as is the RGO based on Eq. 1. Implementation Considerations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The proposed approach has been developed with the anticipation that a computer will be used, especially to represent any empirical distributions and to evaluate any required integrals. The author anticipates a cry that the distribution functions may not be well known and responds that this poses a lesser problem than the unknown effect of assumed safety factors. For a safety factor to be deemed appropriate some knowledge of the tail of the distribution function must be assumed. An implementation of the method would entail: the derivation of the functional dependence of the hazard distribution function from its component distribution functions and point variables. the representation of the distributions of the independent variables by explicit functions or numerical arrays. the replacement of all "narrow" distributions, i.e., variance/mean << 1, by the value of the mean. In practice only the most significant of the distributions will substantively affect the distribution function of the hazard variable. the propagation of the distribution functions of the independent variables into the distribution function of the hazard variable by analytical or more probably by Monte Carlo methods using appropriate random number generators. the numerical evaluation of an RGO from Eq. 1, i.e., F(c,x) and c(F,x), at a desired value of F, possibly by iterative techniques or tabular interpolation, one table for each . the numerical integrations as indicated by equation 3 as desired. some of the "high percentile" integrations may be best performed in the reverse direction and the distribution functions need to be well- behaved, i.e., vanish to zero in a reasonable manner at the extremes. The method, in spite of its seeming complexity and detail, lends itself to a generalized computer implementation easily tailored to site specific scenarios and using site-specific data preferably from an accepted library. The interpretation of the Monte Carlo method either as an event simulation or as a method of numerical integration will permit a concise yet rigorous statement of a site-specific application. Example: Single Path, Single Contaminant ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The hazard index for this case has the form: Eq. 4 HQ = K x C x EF x IR x ED x BAF / (RfD x AT x BAF) where HQ is the hazard quotient, a PDF K is a constant, 10E-6/365 C is the soil concentration, a PDF EF is the exposure factor, a PDF IR is ingestion rate, a PDF ED is the exposure duration BAF is the effective bioavailability factor, , SF(i) is the fraction of the i-th species Rfd is the unbiased reference dose AT is the averaging time BW is the body weight, a PDF and PDF implies a probability distribution function. The EPA/RAG p. 23,52 defines an age-adjusted, weight-adjusted of the form: IRA = SUM (Ed[i] x IR[i] / BW[i] summed over all age groups. When used IRA replaces the cluster: IR x Ed x BW Noting that the variables of the integration can be separated and factored out and retaining the same nomenclature for the "point values", i.e., ED, K, RfD and AT, and as well as the definite integrals over the complete PDFs, i.e., EF, and IR. and BW (using the mean of 1/BW would be preferable), we obtain the simple form: Eq. 5) f[h] dh = ((K x EF x IR x ED x RAF)/(RfD x AT x BAF)) dh = constant x cdh from which one can easily carry out integrations for point values of c or over its PDF, f[c] dc as indicated in equations 1, 2 and 3. The integration required can be carried out explicitly, numerically or by Monte Carlo simulation. In any case the computational workload is small and the method may be used for estimates. The data necessary to carry out this example is available in the EFPC study. The method is not so much any new technology but a better use of existing technology. The EFPC RIA Tier III analysis required the same basic methods that are discussed here. Many of the PDFs and integrals have been evaluated before. It would appear desirable for DOE to invest the very small time necessary to illustrate such a potentially resource- conserving tool. Implementation Observations ~~~~~~~~~~~~~~~~~~~~~~~~~~~ NOTA BENE: The purpose of this exercise is to establish the method NOT to simulate EFPC. To verify the feasibility of the proposed method, a computer program written in ANSI/C using the MS C5.1-compiler has been written to calculate three point value cases, a stochastic RGO for three arbitrary percentiles (90, 95, 99) of the population protected (Eq. 1) and a stochastic remediation simulation (Eq. 3 substituted in Eq. 1) at the same percentiles. The point value examples include estimates for a child, an adult and the EPA age-adjusted, body weight-adjusted ingestion rate (see EPA/RAG p. 23, 52). The stochastic computations included sampling PDFs for ingestion rate, body weight and exposure frequency as well as the soil concentration for the remediation simulation. The PDF values are taken from the DOE/EFPC RIR p. 5-98-108 and Krieged upper layer soil data supplied by SAIC. Soil remediation is simulated by setting the soil concentration to zero for those PDF samples which equal or exceed the remediation level. No attempt was made to calculate mean hazard indexes from Eq. 2 but this is a trivial task. The data used does NOT necessarily represent the best unbiased "central values" the method presumes. The use of the approximate central values (computation 2), as expected, produced estimates so safe as to preclude verification of the method being tested. For example, to produce a meaningful problem(computation 1) the mercury concentration of the Krieged upper layer was exaggerated by a factor of ten. Also the conservative BAF of 0.3 was used rather than the less extreme literature values. When some of the questions which arose during implementation have been resolved, additional EFPC computations using "central values" should be made. Comments: ~~~~~~~~ * The method proved feasible and, for the input parameters used, produced reasonable results for both the RGO and remediation limit at the 90 and 95 percentile levels using a sample size of 1000. To obtain accurate solutions at the 99 percentile level may require a larger sample size or program refinement. The problem is a "number cruncher" and the testing was somewhat limited by the lack of a floating point processor. * Due to the stochastic nature of the computations, the simple iteration methods of linear interpolation and extrapolation failed as anticipated. This was replaced by computing a range of points from the function which spanned the target percentiles, i.e., up to 0.99, and applying a linear least squares fit, (a+bx) to the computed points above 0.8 percentile. The method needs some refinement as some upward concavity remains but a+bx+c*x**2 did not help much. The curve is probably exponential in character. The program should be modified to exploit this method with an improved least squares fit or perhaps a segmented linear fit. Also the method does not perform well if the initial central value percentile is too close to 0.99 due to the stochastic nature of the function and an alternative point allocation rubric needs to be applied to this range. * The log-normal random number generator (RNG) used for IR was found to produce some "impossible" extreme values (as did SAICs). The RNG was truncated at (1,300) after SAIC. This reduced the mean IR from about 100 to 65. Of more concern, is the disparity between the mean and sigma reported by SAIC and those found by the author. This may be due to the semantics used or to a difference in RNGs used. This difference needs resolution as a factor of 1.5 may exist in the means of the variables. * The testing was done on a histogram of the Krieged, upper-layer EFPC soil data which, though adequate for testing, may not represent the complete EFPC situation. It should be replaced by a more representative histogram. * Some of the point values computed for comparison did not agree with the RIR. This may be due to failure of the author to determine the exact details of the RIR computations or to unintentional differences in the test cases. Resolution should be sought for the sake of consistency. * The RNGs used are new and while tested for their 1-st to 4-th moments and sigma about the mean, they may not be perfect. The U(0,1) used was based on the MS supplied rand() which is not of high quality, i.e., I(0,32767). However there should be no significant effects. Computation 1 - Stochastic Risk Assessment Model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Comments: Test Case with Exaggerated Data In order to demonstrate convergence for the remediation model, the data for EFPC has been exaggerated for this computation. The BAF was set to 0.3 which gives rise to a "point soil level protective of 99 percent of the population" equal to 107.09 ppm. Using a soil profile having 10 times the actual mercury levels of EFPC, gives a "remediation level" protective of 99 percent of the population exposed randomly to the site" equal to 230.69 ppm. Running the TEST CASE parameters !!!! Rfd: 0.00030 Soil contaminant species(name,fraction,absorbtion): HgCl2 0.0250 0.0700 HgS 0.4875 0.0018 Hg-el 0.4875 0.0010 BAF: 0.30000 Checking distribution means (n=1000)..... (see mdl[3,4].parms) IR_Tlog-normal: 0.00 3.91 1.15 Mean: 70.05 EF_Triangular: 120.00 270.00 350.00 Mean: 248.15 BW_Log-normal: 0.00 3.22 0.34 Mean: 26.54 SC_Histogram: 48 0.00 5300.00 Mean: 659.89 Point Value Model: EFPC Soil Child - age 1 - 6 ED: 6 AT: 6 IR: 200 EF: 350 BW: 15 RGO: 78 Point Value Model: EFPC Soil Adult - age 7 - 30 ED: 24 AT: 24 IR: 100 EF: 350 BW: 70 RGO: 730 Point Value Model: EFPC Soil Age adj ir/bw 1 - 30 ED: 1 AT: 30 IR: 114 EF: 350 BW: 1 RGO: 274 Stochastic CV Model: EFPC Stochastic Model - Soil - RGO ED: 30 AT: 30 Means: IR: 70 EF: 248 BW: 27 cvRGO: 557 Initial risk_param 557 - cvRGO Crunching PDFs into points .......................... Percentiles vs Stochastic RGO Levels: Fit parms: npts= 25 a= 1.0753 b= -0.000796 SoR**2 (of pct): 0.000543 Calulated: 90 220.13 / 95 157.33 / 99 107.09 Sigma: 0.0048 Note: .99 percentile may have been extrapolated Stochastic CV Model: EFPC Stochastic Model - Remediation ED: 30 AT: 30 Means: IR: 70 EF: 248 BW: 27 cvRGO: 557 Initial risk_param 5300 - Upper soil limit Crunching PDFs into points ............................ Percentiles vs Stochastic Remediation Levels: Fit parms: npts= 23 a= 1.0526 b= -0.000272 SoR**2 (of pct): 0.001213 Calulated: 90 562.11 / 95 377.99 / 99 230.69 Sigma: 0.0074 Note: .99 percentile may have been extrapolated Computation 2 - Stochastic Risk Assessment Model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Comments: EFPC "Central" Values = No remediation needed This computation used the EPA RfD of 0.0003 and a BAF based on the Hg species found on EFPC. When considered as "unbiased, central values", they are undoubtly still conservative. For these values the "point soil level" which will protect 99 percent of the population is 10709 ppm. This level is so high that in the "remediation model", there are no unsafe soil regions using the Kriged EFPC "upper layer data and hence no computations were carried out for remediation. Please see Computation 1 using exaggerated data. Rfd: 0.00030 Soil contaminant species(name,fraction,absorbtion): HgCl2 0.0250 0.0700 HgS 0.4875 0.0018 Hg-el 0.4875 0.0010 BAF: 0.00309 Checking distribution means (n=1000)..... (see mdl[3,4].parms) IR_Tlog-normal: 0.00 3.91 1.15 Mean: 65.59 EF_Triangular: 120.00 270.00 350.00 Mean: 248.36 BW_Log-normal: 0.00 3.22 0.34 Mean: 26.16 SC_Histogram: 48 0.00 530.00 Mean: 70.87 Point Value Model: EFPC Soil Child - age 1 - 6 ED: 6 AT: 6 IR: 200 EF: 350 BW: 15 RGO: 7592 Point Value Model: EFPC Soil Adult - age 7 - 30 ED: 24 AT: 24 IR: 100 EF: 350 BW: 70 RGO: 70859 Point Value Model: EFPC Soil Age adj ir/bw 1 - 30 ED: 1 AT: 30 IR: 114 EF: 350 BW: 1 RGO: 26572 Stochastic CV Model: EFPC Stochastic Model - Soil - RGO ED: 30 AT: 30 Means: IR: 66 EF: 248 BW: 26 cvRGO: 56910 Initial risk_param 56910 - cvRGO Crunching PDFs into points ..................... Percentiles vs Stochastic RGO Levels: Fit parms: npts= 20 a= 1.0886 b= -0.000009 SoR**2 (of pct): 0.000948 Calulated: 90 20486.78 / 95 15054.93 / 99 10709.45 Sigma: 0.0071 Note: .99 percentile may have been extrapolated Stochastic CV Model: EFPC Stochastic Model - Remediation ED: 30 AT: 30 Means: IR: 66 EF: 248 BW: 26 cvRGO: 56910 Initial risk_param 530 - Upper soil limit NOTE: cvRGO exceeded highest soil concentration Crunching PDFs into points First(lowest) pct = 1.0000 > 0.9 - Extrapolations may be unreliable First(lowest) pct = 1.0000 > 1.0 - No unsafe soil concentrations Model Terminated