Skip to content Skip to main navigation Report an accessibility issue
Photos of graph lines between countries

Social Determinants of Health Geospatial Research Methods

by Radion (Rodi) Svynarenko, Ph.D.

March 2025

The Social Determinants of Health (SDOH) database, created by the Agency of Healthcare Research and Quality (AHRQ) is an excellent resource of public health data. It includes information from over 40 different federal surveys and databases. It can allow researchers to rethink the research methodologies that they commonly apply for studying SDOH.

SDOH are “the conditions in the environments where people are born, live, learn, work, play, worship, and age that affect a wide range of health, functioning, and quality-of-life outcomes and risks” (Healthy People 2030). This definition is widely used in public health research, and research in this area is increasing every year. Articles with “social determinants of health” as a keyword jumped from 45 in 2007 to 1,411 in 2018  (Mishori, 2019). Many of these studies aim to correlate socio-demographic and healthcare measures, often leading to conclusions about social inequalities that should be addressed through social policies and political interventions. In these studies, the SDOH concept emphasizes the word “social,” ultimately leading to the question of “Why?” and a search for social, policy, and political causes. Although this direction of research may have some merit (Mishori, 2019), it can also be considered misleading. It is often overlooked that the definition of SDOH is based on the concept of “the conditions in the environments where…,” which asks the question of “Where?” This question should be addressed using geospatial research models because classical statistical models can lead to biased results. Here are three causes of such bias:

Spatial autocorrelation: The main assumption of statistical models is the randomness of observations, meaning that there must be no or minimal relationships between observations (i.e., cases and samples). However, in SDOH analysis, sampling often reveals that the tendency of nearby counties is not normally distributed. For example, in our preliminary data analysis, we used a machine learning model, a Random Forest Classifier, to estimate the effects of a set of predictors on the dependent variable. Model validation, conducted using an Ordinary Least Squared Regression (OLS) model with the same set of predictors, revealed a coefficient of determination (R²) of 19.26%. Analysis conducted using Geographically Weighted Regression (GWR), which takes into account spatial autocorrelation, resulted in an R² of 41.67%, more than double that of OLS.

Issues with sampling: It is common in machine learning to split samples into training and testing subsamples to prevent model overfitting. Model fitting is performed using the training data, and then model performance is tested using the testing subsample. However, because of the issue of autocorrelation, selecting cases for training and testing should be based on a non-randomized spatial sampling strategy. For example, models could be trained on a stratified sample of counties from one region and tested on counties from another region or using data from previous years..

Missing data: The SDOH database has a large number of variables with missing data. The common strategy for handling missing data is to conduct imputations using existing imputation algorithms, such as nearest neighbor, multiple imputations, and so forth. However, this may create a problem of ecological fallacy when an inference about a single geographic location is deduced from the group to which this location belongs. For example, rural areas in many states experience hospital closures, and this problem is more severe in states that did not expand the Affordable Care Act  (Lindrooth et al, 2018). So, imputation of missing data related to the number of hospitals in the community based on existing imputation techniques will introduce a serious bias in the analysis.

In summary, the SDOH database provides unique data that can be effectively used for finding social and environmental determinants of health. However, to draw unbiased conclusions, it may be beneficial to incorporate methods of geospatial analysis.

References

Healthy People 2030, U.S. Department of Health and Human Services, Office of Disease Prevention and Health Promotion. Retrieved February 25, 2025, from https://odphp.health.gov/healthypeople/objectives-and-data/social-determinants-health

Mishori, Ranit MD, MHS, FAAFP. The Social Determinants of Health? Time to Focus on the Political Determinants of Health!. Medical Care 57(7):p 491-493, July 2019. | DOI: 10.1097/MLR.0000000000001131

Lindrooth, R. C., Perraillon, M. C., Hardy, R. Y., & Tung, G. J. (2018). Understanding the relationship between Medicaid expansions and hospital closures. Health Affairs37(1), 111-120.