I haven’t had a math class since high school, so I was pretty nervous about this graduate level statistics course. I ended up doing well, and the professor asked if my undergrad degree and job was “quantitive”…he was probably surprised to find out my first degree was in art!
The statistics course started with a brief overview of data analysis and an introduction to the Minitab statistics software. Data analysis includes different ways to measure and interpret the data you are given, such as visualizing the data in bar charts, pie charts, histograms, scatter plots. Data analysis also includes describing the data in numerical ways such as finding the mean (average), median (middle response), range, shape, variance and standard deviation (which both measure how far the data “scatters” around the average value), and more. We learned about confidence intervals in this chapter, which is a range of numbers around an estimate that you can predict with a certain percentage, such as being 95% confident that the result will be between 1 and 5. An interesting fact learned in this section was that the sample size should be at least 30 whenever possible, such as a survey of 30 people, a size described as “large” in many of the complicated rules in later chapters.
The main topics of the class were:
- simple linear regression
- multiple regression
- logistic regression
Simple linear regression is a statistical model that uses one single numerical variable (x, called the independent variable) to predict another number (y, called the dependent variable because its value depends on the value of x). For the fictional data set I was given for my assignment, the x variable was work hours per one week, which was used to try to predict salary (y). For this data set, I found out that work hours for that one week only could predict 5% of the variation in salary, which was not a very useful model.
Multiple linear regression is similar to simple linear regression, except it involves multiple x variables. Sometimes those x variables can be categorical, such as yes/no or male/female. For my assignment, the x variables were work hours one week, if the person is a budgetary decision maker (yes or no), age of the employee, and years the employee has worked at the company. These x variables in the equation were used to predict y (salary). All together those variables could predict 80% of the variation in salary, which is a lot more than the 5% prediction from just using work hours per week to predict salary. You can see how multiple regression can build a better model. There are various ways to determine the best model/equation to use, and tests to determine which potential x variables should or shouldn’t be used, plus tests to account for the possible interaction between the x variables.
Logistic regression differs than regular simple and multiple regression because the Y variable is a categorical variable – yes or no, male or female, etc. Because there can only be two possible answers for Y, 1 or 0 (yes or no, female or not female, etc.), more complicated math equations must be used to predict the Y response, using the odds ratio and logarithms. It is much too complicated to write more details about here! An example of a logistic regression model would be a credit card company trying to answer the question, “Which existing credit card holders should we target to see if they want to upgrade their card?” The Y variable would be Yes (1) they would upgrade (so should be targeted), and No (0) they wouldn’t upgrade (so should not be targeted). The x variables in the equation, used to predict upgrading (y), were how much the cardholder spent last year, and if the cardholder ordered additional cards last year.
Hopefully this crash course in statistics will help me analyze data such as Google Analytics in the future. Numbers are something I usually avoid, which is one reason I decided to go to business school, to redevelop the left side of my brain to complement my right side (creative skills) and round out my knowledge to prepare me for a new career in marketing, PR, or similar fields.
The book used for class was Basic Business Statistics, 13th edition, by Berenson (our professor), Levine, and Szabat. Rent it on Amazon Textbook Rental around $20 for several months. Hooray for Amazon Textbook Rental!
I started the online MBA program at Montclair State University in January 2017.