Correlation is a common statistic to measure a general linear relationship between two variables. Explain why correlation does not equal causation. Describe the data characteristics necessary to calculate a Pearson correlation coefficient. Design a study that would apply the Pearson correlation coefficient as an appropriate statistic.
Please respond with at least 300 words, and cite any and all references used.
This is an example of an answer I found please only use as a reference:
Correlation does not equal causation; correlation may equal causation but does not have to. According to Corty (2016), when there is a correlation between two variables it only means there is an observed association between the two but does not mean that one variable causes the other. As for measures of association, a Pearson correlation coefficient is a statistical test that measures how linear the relationship is between two interval and/or ratio variables (Corty, 2016). To calculate a Pearson correlation coefficient (r), the following characteristics are needed: X = a case’s score on variable X, Mx = mean score on variable X, Y = a case’s score on variable Y, My = mean score on variable Y, SSx = sum of the squared deviation scores for variable X, and SSY = sum of the squared deviation scores for variable Y. These data characteristics and the definition of the Pearson correlation coefficient are best explained through an example. For instance, a public health professional who specializes in biometrics wanted to see if body mass index was truly an accurate health indicator. The goal would be to see if there was a relationship between height and weight, specifically in adults over the age of 18 and under 65. To collect data, a health survey would be randomly sent out to 1,000 adults in the state of Colorado via mail with options to respond through mail, online, or by phone. The null hypothesis would state that there is no correlation between height and weight, while the alternative hypothesis would state that there is a correlation between height and weight. The confidence level would be set at 95% and depending on the calculated degrees of freedom, the critical value would be identified. Finally, r would be calculated and interpreted