![]() |
Modeling Survival Data with the Statistics Toolbox
[B]Modeling Survival Data with the Statistics Toolbox[/B]
[B]by[/B] [EMAIL="[email protected]"]Dan Doherty[/EMAIL] and [EMAIL="[email protected]"]Tom Lane[/EMAIL] [URL="http://www.mathworks.com/programs/digest_offer/nov04/modeling.html"][IMG]http://www.mathworks.com/company/newsletters/digest/images/download_scripts.gif[/IMG][/URL] A growing number of engineers and scientists use statistics to manage and study the vast quantities of data they generate. Probability distributions are a fundamental aspect of statistical data analysis, as they enable variability analysis, sensitivity studies, hypothesis testing, and parameter estimation and prediction. [URL="http://www.mathworks.com/products/matlab/"]MATLAB[/URL] and the [URL="http://www.mathworks.com/products/statistics/"]Statistics Toolbox[/URL] provide assorted command-line and graphical tools for generating and evaluating probability distributions. In this article, we use the new Distribution Fitting Tool in the Statistics Toolbox to analyze reliability test data from a motion controller. [B]Motion Controller Reliability Test Data [/B] Our test data consists of the survival time for 100 motion controllers, with an upper limit of 1,600 hours for each test. When the test ends, the survival time for any controller that does not fail is known only to be longer than the upper time limit. Figure 1 shows the survival time for each of the 100 controllers. [URL="http://www.mathworks.com/company/newsletters/digest/nov04/images/survival100_wl.jpg"][IMG]http://www.mathworks.com/company/newsletters/digest/nov04/images/survival100_w.gif[/IMG][/URL] [I]Figure 1. Click on image to see enlarged view.[/I] Our results show that 38 of the controllers survived longer than the 1,600-hour test limit. Their survival times cannot be precisely observed beyond this point and are known only to exceed 1,600 hours. Data like this is said to be “censored” because the end of the test prevents us from observing the exact failure times. The controllers that failed before the 1,600-hour limit appear to follow a relatively smooth distribution, except for a clump of about 7 early failures. Early failures like these are common with reliability data and represent a phenomenon called “infant mortality.” The controllers that survived past this early stage show a smoother distribution of failure times. [B]Importing and Preprocessing Data with the Distribution Fitting Tool[/B] Because many of the elements of this data set are censored and have a value equal to 1,600, we cannot evaluate distributions without first distinguishing between censored and uncensored data entries. The survival time data is contained in a 100-element vector called lifetime. We will create a logical vector that is true for censored elements of lifetime otherwise false. censored = (lifetime == 1600); We can now analyze our data set in the Distribution Fitting Tool. We type dfittool at the command line to launch the GUI and import our data by clicking [B]Data[/B] and selecting the lifetime and censored vectors as the Data and Censoring vectors (Figure 2). We can also use the [B]Data[/B] menu to adjust the bin sizes from the default values. [URL="http://www.mathworks.com/company/newsletters/digest/nov04/images/datamenu_wl.jpg"][IMG]http://www.mathworks.com/company/newsletters/digest/nov04/images/datamenu_w.gif[/IMG][/URL] [I]Figure 2. Click on image to see enlarged view.[/I] Figure 3 shows how the Distribution Fitting Tool displays the probability density function (PDF) of our censored lifetime data. The PDF is roughly proportional to the probability of failure during a time interval. [URL="http://www.mathworks.com/company/newsletters/digest/nov04/images/pdfn_wl.jpg"][IMG]http://www.mathworks.com/company/newsletters/digest/nov04/images/pdfn_w.gif[/IMG][/URL] [I]Figure 3. Click on image to see enlarged view.[/I] The censored points are not displayed because the values are unknown. These censored values are, however, incorporated in the parameter calculations for any distributions we fit to this data. Because our controllers are routinely replaced well before 1,600 hours under actual operating conditions, we do not need to predict survival beyond that time. In the Distribution Fitting Tool we can easily recognize the early failures. Because our goal is to model the failure times for controllers that survive a 50-hour burn-in period, we will ignore the set of early failures in our distribution fits, as they fall beneath the 50-hour threshold. We can do this in either of two ways: by excluding the early failures within the Distribution Fitting Tool, or by importing only the values that survive to the threshold. With the [B]Exclude[/B] menu, we could create an exclusion rule that lets us ignore all controllers that survive less than 50 hours. However, we will create a workspace vector called lifetimeburnin that includes only the controllers that survive the burn-in period, and import that vector into the Distribution Fitting Tool. [B]Fitting a Distribution[/B] The Weibull distribution is a versatile distribution often used for modeling survival data. It is one of 16 predefined distributions available in the Distribution Fitting Tool (you can also use your own custom distribution). To begin fitting this distribution to our data, we click [B]New Fit [/B]and select the Weibull distribution. When we apply the fit, the estimated scale and shape parameters and fit statistics for the distribution are displayed within the [B]Edit Fit [/B] menu (Figure 4) and the distribution is overlaid on the PDF within the main GUI, as shown in Figure 5. The Weibull distribution appears to provide a good fit up to 1,600 hours. We can change the display type to obtain the cumulative distribution function (CDF), which displays the probability of failure before a specified time (also shown in Figure 5). The Distribution Fitting Tool also supports other common displays, including probability plots, inverse CDF, and survival plots. [URL="http://www.mathworks.com/company/newsletters/digest/nov04/images/editfit_wl.jpg"][IMG]http://www.mathworks.com/company/newsletters/digest/nov04/images/editfit_w.gif[/IMG][/URL] [I]Figure 4. Click on image to see enlarged view.[/I] [URL="http://www.mathworks.com/company/newsletters/digest/nov04/images/weibull1_wl.jpg"][IMG]http://www.mathworks.com/company/newsletters/digest/nov04/images/weibull1_w.gif[/IMG][/URL] [URL="http://www.mathworks.com/company/newsletters/digest/nov04/images/weibull2_wl.jpg"][IMG]http://www.mathworks.com/company/newsletters/digest/nov04/images/weibull2_w.gif[/IMG][/URL] [I]Figure 5. Click on each image to see its enlarged view.[/I] We could also use the Distribution Fitting Tool to fit and compare other distributions useful for modeling survival data, such as Gamma and Birnbaum-Saunders. We can also plot non-parametric or “smooth” distributions and compare them to the data and to the other fits. [B]Predicting Controller Survival from the Weibull Distribution[/B] We will use the Weibull fit to predict survival time durations for 25 percent, 50 percent, 75 percent, and 95 percent of our controllers. We click [B]Evaluate [/B] and assess the inverse CDF function, also known as the Quantile function, at P-values of 0.75, 0. 5, 0.25, and 0.05. We can also evaluate 95% confidence intervals for each quantile value. The results are summarized as follows: 95 % confidence interval [CENTER]% controllers expected to survive [/CENTER] [CENTER]Survival time (hours) [/CENTER] [CENTER]Lower bound (hours) [/CENTER] [CENTER]Upper bound (hours) [/CENTER] [CENTER]25 [/CENTER] [CENTER]1870 [/CENTER] [CENTER]1684 [/CENTER] [CENTER]2076 [/CENTER] [CENTER]50 [/CENTER] [CENTER]1482 [/CENTER] [CENTER]1356 [/CENTER] [CENTER]1619 [/CENTER] [CENTER]75 [/CENTER] [CENTER]1103 [/CENTER] [CENTER]983 [/CENTER] [CENTER]1237 [/CENTER] [CENTER]95 [/CENTER] [CENTER]618 [/CENTER] [CENTER]490 [/CENTER] [CENTER]779 [/CENTER] Our Weibull distribution predicts that 95% of our controllers will last 618 hours, and 25% of our controllers will survive beyond 1,600 hours. Upper and lower confidence bounds provide a range of durations for which we can be reasonably sure a percentage of our controllers will survive. [B]Reusing Work from the Distribution Fitting Tool[/B] After evaluating our data in the Distribution Fitting Tool, we have two options for saving our work. We can save the session into a file if we want to continue creating and evaluating distributions within the GUI. Doing so will retain all the information and data from our fitting sessions when we load it back in. Alternatively, we can automatically generate an M-file from the session. This enables us to incorporate our work in other M-files or analyze the same distributions on another data set. Figure 6 shows the M-file created from the Weibull distribution for our controller survival data. This M-file fits our Weibull distribution to a selected input vector and then plots the data and the fits in a MATLAB figure window without reopening the Distribution Fitting Tool. [URL="http://www.mathworks.com/company/newsletters/digest/nov04/images/mfile_wl.jpg"][IMG]http://www.mathworks.com/company/newsletters/digest/nov04/images/mfile_w.gif[/IMG][/URL] [I]Figure 6. Click on image to see enlarged view.[/I] Prior to generating the M-file, we could have also used the Distribution Fitting Tool to import additional data sets from the workspace, fit additional probability distributions, exclude unwanted portions of our data, and plot other functions, such as the survivor function, to assess our distributions. This analysis would all be incorporated in our generated M-file. [B]Summary[/B] We used the Distribution Fitting Tool to evaluate probability distributions for censored survival data from a motion controller, helping us predict future observations with greater confidence. We found that a Weibull distribution effectively fits our data, and we were able to predict the likelihood of our controllers failing at certain times. After we finished evaluating our distribution, we generated an M-file so that we could replicate our GUI session using other data, or incorporate it into other M-files. [URL="http://www.mathworks.com/programs/digest_offer/nov04/modeling.html"][IMG]http://www.mathworks.com/company/newsletters/digest/images/download_scripts.gif[/IMG][/URL] [I]Download the code described in this article.[/I] [URL="http://www.mathworks.com/company/newsletters/digest/nov04/modeling.html"]更多...[/URL] |
所有时间均为北京时间。现在的时间是 19:44。 |
Powered by vBulletin
版权所有 ©2000 - 2025,Jelsoft Enterprises Ltd.