I came to write this article through what was a predictable yet still unexpected set of events. I recently finished a course on statistical testing and reporting, and I set out to write a series of articles explaining the details of the most useful statistical tests I learned. I wished to do this both to cement my own knowledge as well as help other data scientists learn a topic I found immensely helpful.
The first of these articles was going to be on the t-test, a common statistical test used to determine if two means (averages) from different sets of data are statistically different. I began to write this article, but I realized I needed to first explain that there are two different kinds of t-tests. Then, I realized that to explain that, I needed to explain a separate but related underlying concept. The cycle continued as I planned out the article.
Furthermore, I realized that I would need to do this with each new article I wrote, as every statistical test required the same underlying knowledge base. Rather than repeat this information in each article, it would be much better to reference one standing source of information.
And thus, this article was born. In the words that follow, I will attempt to give a concise but effective primer on the basic concepts you should be familiar with in order to conduct and report statistical tests. For your convenience, I have broken down the concepts in the order you would encounter them running a study from start to finish. So without further ado, let’s get into it.
Quantitative Study Design
When designing a study, there are several important details one needs to consider. This article is not about study design, and I won’t be going into the details of best practices and the reasoning behind them. That said, the design of a study strongly influence the eventual statistical test needed, and so it is essential to have a basic understanding of the following concepts:
- Factors and measures
- Levels and treatments