Why do we need to know the data distribution to do our analyses? We need to know how the data are distributed to determine the most appropriate statistical analyses to use.
- When the outcome variable is continuous (interval/ratio), linear regression is a common method to use.
- When the outcome variable is dichotomous (yes/no or 0/1), logistic regression is likely, particularly if the outcome is not highly prevalent in the sample. If the outcome is highly prevalent, however, negative binomial regression may be more appropriate is the final intention of the analysis is to estimate the likelihood of the outcome.
- When the outcome variable has ordered categories (0/1/2/3), ordinal logistic regression (proportional odds regression) is potentially useful. In this situation, we also expect that the relationships between the categories are proportional (comparable).
- When the outcome variable has unordered categories, or ordered categories in which the relationships between the categories are not proportional, we may use multinomial regression models.
- When the outcome is a count of the number of events occurring from some large observed population, Poisson regression may be appropriate.
- When the outcome is the time until the event occurs, and data has been collected over an extended period of time, survival analyses are used. Cox proportional hazards regression models are frequently used, but other statistical models are also available and may be more appropriate for your data.
- When you’re thinking about the data distribution, note that it determines the type of analytic conclusions you can infer, so choose carefully!
About the Author
Dr. Vicki Lawrence is an academic researcher who studies the epidemiologic nature of social conditions in relation to cardiovascular and other disease outcomes. More specifically, her work focuses on studies of poor health among African Americans and health disparities that may occur my age, race, and gender in cardiovascular and mental health outcomes. Utilizing her background in epidemiology and biostatistics, she has provided statistical support on multiple studies with various investigators commonly focused on physical and mental health data. In addition, she has worked with clinicians, research investigators, and tutored multiple graduate students as well in public health, epidemiology, social work, medicine, education, and nursing to tackle statistics related issues.