Labfans是一个针对大学生、工程师和科研工作者的技术社区。 | 论坛首页 | 联系我们(Contact Us) |
![]() |
![]() |
#1 |
高级会员
注册日期: 2019-11-21
帖子: 3,006
声望力: 66 ![]() |
![]()
The graphics in my post about R^2 were produced by an updated version of a sixty-year old program involving the U.S. census. Originally, the program was based on census data from 1900 to 1960 and sought to predict the population in 1970. The software back then was written in Fortran, the predominate technical programming language a half century ago. I have updated the MATLAB version of the program so that it now uses census data from 1900 to 2020.
Contents censusapp2024 The latest version of the census application is now available at censusapp2024. Here are the data and the opening screenshot. [t,p]=UScensus;fprintf('%12d%12.3f\n',[t,p]') 1900 75.995 1910 91.972 1920 105.711 1930 123.203 1940 131.669 1950 150.697 1960 179.323 1970 203.212 1980 226.505 1990 249.633 2000 281.422 2010 308.746 2020 331.449 ![]() Risky Business Today, MATLAB makes it easier to vary parameters and visualize results, but the underlying mathematical principles are unchanged:
Here are the least-squares fits with linear, cubic, and degree seven polynomials and the interpolating polynomial. As the polynomial degree increases, so does R^2, until R^2 reaches one with the exact fit. Do any of these fits look like they could be used to predict future population growth? ![]() Splines In addition to polynomials, you can choose interpolation by three different piecewise Hermite cubics.
![]() Exponentials It is also possible to do nonlinear least squares fits by an exponential, a logistic sigmoid, and an exponential of an exponetial known as the Gompertz model.
![]() Predictions Which fits are suitable for predicting future population size? Despite their large R^2 values, polynomials of any degree are not suitable because outside of the time interval they behave like polynomials and do not provide realistic predictions. Splines were never intended for extrapolation. That leaves the exponentials. The simple exponential model grows exponentially and is not suitable. The Gompertz fit does approach a finite asymptotic limit, but the value is an astronimical a = 2101, corresponding to 2.1 $\times 10^9$ inhabitants. Hopefully, that is out of the question. The logistic fit has an asymptotic limit of a = 655.7. We recently passed the value of t where p(t) reaches a/2, namely c = 2018. So, the logistic model predicts that the long-term size of the U.S. population will be about twice its current value. Is that realistic? Probably not. ![]() Conclusion The British statistician George Box once said, "all models are wrong, some are useful." This is true of the models of the U. S. Census that I have discussed over the past sixty years. Here is censusapp2024 after all its buttons have been pushed. The extrapolation date is set to 2040. White noise has been added to the data. The model is a fourth-degree polynomial with an R^2 = 0.99. The R^2 value and the error estimates produced by errs account for errors in the data, but not in the model. This particular model does a lousy job of predicting even twenty years in the future. Some of the other models are better, many are worse. Hopefully, their study is worthwhile. ![]() Blogs I have made blog posts about the census before, in 2020 and in 2017. FMM Predicting population growth is featured in Computer Methods for Mathematical Computations, by George Forsythe, Mike Malcolm and myself, published by Prentice-Hall in 1977. That textbook is now available from an interesting smorgasbord of sources, including Google Scholar, Amazon, dizhasneatstuff, Abe Books, Internet Archive, PDAS, WorldCat (Chinese). Software censusapp2024 is available at censusapp2024. |
![]() |
![]() |