9-line access access management access point accessibility ADA air quality alignment amenity antiplanner atlanta BART BID bike Blogs boston branded bus branded buses brookings brt bus Bus Rapid Transit BYU capacity car pool cars central link Centrality certification commuter rail condo conformity congestion congestion pricing connections consistency coverage crossings CRT cycling DART dedicated dedicated right of way density denver depreciation developers development dynamic pricing economics efficiency Envision Utah equity eugene exclusive extension FAQ favela Federal Funding Flex Bus florida free fare zone freeways Frequent Transit Network frontrunner frontunner Gallivan garden cities gas prices geotagging goat Google grade-separation Granary District growth headway heavy rail hedonic High Speed Rail history housing housing affordability housing bubble housing prices HOV income infill innovative intersections intensity ITS junk science LA land use LEED legacy city light rail linear park location LRT lyft M/ART malls mapping maps market urbanism metrics metro MetroRail missoula mixed mixed traffic mixed-traffic mobile mode choice Mode Share multi-family MXD neighborhood networks news NIMBY office online op-ed open letter Operations parking parking meters peak travel pedestrian environment phasing Photomorphing planning Portland property property values Provo proximity quality_transit rail railvolution rant rapid rapid transit RDA real estate redevelopment reliability research retail Ridership ridesharing right of way roadway network ROW salt lake city san diego schedule schedule span seattle separated shuttle silver line single family SLC SLC transit master plan slums smartphone snow sprawl standing stop spacing streetcar streetscape streetscaping subdivision subsidy Sugarhouse Sugarhouse Streetcar Tacoma taxi technology tenure termini time-separation TOD townhouse traffic signal tram transit transit networks transit oriented development Transit Planning transponder transportation travel time TRAX trip planning trolley tunnel uber university of utah urban design urban economics urban land UTA UTA 2 Go Trip Planner utah Utah County Utah Transit Authority vmt walking distance web welfare transit Westside Connector WFRC wheelchairs zoning

Wednesday, December 21, 2011

Dangers of Linear Regression

Linear regression is adequate for short-term forecasts, but dangerous over long time periods. Any given regression provides a snap-shot of current conditions. As conditions change over time, the predictive capacity of a regression declines.

Maintaining the predictive capacity requires repeating the process of data collection, cleaning, and regression.  This runs up agains the limitations of data: Statistical methods require statistically significant data samples to function. Long-term data series requires that data collections methods, geography, and metrics remain constant over time, with  'ideal' data-sets are collected at a single point in time. These limitations on the availability of data limit what can be regressed. 

More dangerously, there is an increasing reliance on automatically calculated statistical analytics  as measures of formal statistical validity, without the recognition that these measures are not 'absolute', but rather innovative methods developed to detect known errors in the application of other statistical methods. Successfully applying and interpreting these results requires a separate body of knowledge to identify and explain anomalies.

To improve model quality, there is a strong desire to reduce the number of variables present in a regression analysis. When faced with two highly correlated variables, only one may be included. This becomes extremely problematic if two highly correlated variables diverge over time,  it becomes an open question about which variable actually possessed predictive capacity. Or if either variable did, and the validity of a model actually resulted from a the two variables linkage to a third, unregressed variable. Regression models are only capable of showing correlation between different variables, rather than causal relationships. Without that explicit linkage, it becomes possible to draw conclusions that are statistically valid, but that have limited utility.