Let the People Define the Regions

Michael Cain

This off-the-cuff piece is inspired by, in order, people here saying "When you write about X, it sounds like a foreign country," for values of X like the South, the Midwest, or the Northeast; James Hanley's definition of the Midwest and the interesting set of "But wait!" comments it got; and Mike Schilling's (entirely proper) insistence that mathematics is a part of being well-rounded. I have occasionally floated the hypothesis that multi-state regions might be defined by the pattern of migration between states; that people generally prefer to stay within a region where they're comfortable. That thought was originally spurred by playing with this nifty visualization tool. The Census Bureau provides summary annual spreadsheets for state-to-state migration flows here. Caveats if you're going to try to use them: the 2009 data is in a different layout than 2011 and 2012, and the 2010 file is corrupted in some fashion and won't open. You might also enjoy fooling with Forbes' county-level interactive application based on IRS data.

With an appropriate distance measure defined based on the number of people that move between states [1], it's possible to partition the states into clusters such that states within a cluster are all "close" to one another and not so close to states in other clusters. I used a hierarchical partitioning scheme that builds clusters from the bottom up, using the average distance between cluster members to decide which clusters to merge. The seven-region partition shown here even makes some sense: there's a Northeast, a West, a Midwest split in two parts, a Southeast, a Mid-Atlantic, and a "Greater Texas" group. To previous debates about the Midwest: migration patterns say Kentucky should be included with Illinois, Indiana, and Ohio; Kansas and Missouri are Texas-centric, not Midwestern. As always, border states are an issue. For example, the Forbes application suggests the difficulty with Missouri: most migration involves the bigger metro areas. In Missouri, St. Louis is coupled with areas to the east and Kansas City with areas to the west and south, but Kansas City dominates.

For what it might be worth, the New York/New England region appears to be the one least connected to any of the other regions, with the ten-state West second.


[1] Sanity check: which pairs of states are closest using this particular measure? California/Nevada, Massachusetts/New Hampshire, Texas/Oklahoma, and Kansas/Missouri are in the top ten "closest" pairings. This generally coincides with our — or at least my — intuition about things.