# MATH 225N Discussion Correlation and Regression

## MATH 225N Discussion Correlation and Regression

MATH 225N Discussion Correlation and Regression

The process involved a couple of unexpected steps for me, so I will share as much as I can here to hopefully help Folks who might want to place something like this in a Post during Weeks X-X . 😉

XXXXX gave some great instructions on doing this kind of thing too, but I have a variation on a few of the steps that she performed. 😉

First, before copying and pasting the Excel output to an open blank Word document, I had to go through the very irritating step of giving all the cells to be copied and pasted a sort of light background – that orangy peachy color that you see.

Otherwise when I created my jpg / jpeg most of the cells were dark black and not readable so to speak.

Next in Excel I highlighted the cells to be copied and then did contol c

Then in a blank Word document I used the mouse and clicked but had to be careful about which Paste option that I used – I had to use Paste as a picture, which was one of the last options available to me from left to right.

Then I clicked on the image in the Word document and did Save as Picture

Then I saved it to my desktop and saved it as a jpg / jpeg

Then in the course shell I clicked on Files and then clicked on Week X Files

Then I uploaded my newly created image from my desk top into the Week X Files area

Then in my Post I clicked on Embed Image and then clicked CANVAS and then clicked Course Files and then clicked Week X Files

Then I located my desired image and clicked on it

Then I scrolled down a bit and clicked Update or whatever it is…

Then the image magically appeared in this Post ! 🙂

So the two steps that surprised me the most were having to give all the cells a light background and secondly instead of just blindly pasting into Word I had to be thoughtful and intentional about “which” “paste method / type” to use.

**Click here to ORDER an A++ paper from our Verified MASTERS and DOCTORATE WRITERS MATH 225N Discussion Correlation and Regression :**

Thanks XXXXX and Best Wishes and please enjoy the upcoming Week X and work hard and learn a lot !

I really appreciate that very much !!

I know you are busy in Week 8 so attached are some optional data sets that you can use for your Week 8 graded Posting assignment in case you don’t feel like you want to look around all over the internet for a data set.

Thanks Friends and try hard not to repeat a Post about a data set that another class member has already Posted on.

And note some of the data sets are large ( more than 50 ordered pairs ) so be sure you capture and analyze all the data if you choose one of the larger data sets !!

**You have to please be careful though** because the Week 8 Excel spread sheet I bet can only accommodate a fairly small data set ( fairly small number of ordered pairs ) . So if you pick one of the data sets from this attached spread sheet here, **PLEASE be sure that it does not have more ordered pairs than what your Excel spread sheet that you will use for the analysis can handle** / accommodate !!! 😉

Thanks Friends and Best Wishes !!

You are encouraged to find your own data sets that interest you but if you are really pressed for time you can choose and use a data set from this attachment.

Some of these data sets have more to do with nursing / healthcare / nutrition / medicine than others so if you pick a data set from the attached try to pick one that ties in with your interests but do feel free to select any of them, including data sets like the house sizes / housing prices or the basketball bouncing data sets…

Thanks for your hard work and Best Wishes Friends !!

I used my normal calculator and came up with the following values to your first question, about predicted y values: 164.25, 170.5, 176.75, 183, 189.25, 195.5, 201.75, 208, 214.25, 220.5 and 226.75. These numbers are quite different from the y values in your excel sheet.

Why?

Is it also safe to assume that if 59% of the variation in total cholesterol was explained by the variation in BMI then 41% of the variation in total cholesterol was not explained by the variation in BMI?

Regression analysis is how we measure cause and affect relationships and determine if they are statistically sound or not. Correlation alone is not causation and that is why patterns and influence must be studied (Holmes, Illowsky, and Dean, 2017). If a regression analysis were done on BMI, there are many probable independent variables. The easiest one and most common to think of would be the patient’s diet. We could break this down and become more specific such as total cholesterol intact or total fat intact. Other variable to consider would be exercises or illnesses such as lipedema or lymphedema. Also, things such as COPD and CHF are important to consider. As mentioned in our lesson this week, correlation is not causation and conducting further experiments and statistics is needed to determine whether the results are based on influence or coincidence.

In a study published in Environmental Health Perspectives blood pressure, heart rate, and cardiac biomarkers and the correlation with air pollution was studied. The dependent variable being the blood pressure, heart rate, and biomarkers, and the independent variable was exposure to air pollution. This study took place between 1995-2013. The results state “We observed some evidence suggesting distributional effects of traffic-related pollutants on systolic blood pressure, heart rate variability, corrected QT interval, low density lipoprotein (LDL) cholesterol, triglyceride, and intercellular adhesion molecule-1 (ICAM-1)”. There conclusion also uses subjective words such as “may effect” (Bind, Peters, Koutrakis, Coull, Vokonas, and Schwartz, 2016). With this in mind and the lack of knowledge of other factors related to the participants health I would say it is difficult to exclude the possibility of coincidence in this specific study.

### References:

Bind, M., Peters, A., Koutrakis, P., Coull, B., Vokonas, P., & Schwartz, J. (2016). Quantile Regression Analysis of the Distributional Effects of Air Pollution on Blood Pressure, Heart Rate Variability, Blood Lipids, and Biomarkers of Inflammation in Elderly American Men: The Normative Aging Study. *Environmental Health Perspectives. *https://ehp.niehs.nih.gov/doi/10.1289/ehp.1510044#:~:text=Results%20%20%20%20Outcomes%20%20%20,%20%20%20%2014%20more%20rowsLinks to an external site.

Holmes, A., Illowsky, B., & Dean, S. (2017). Introductory Business Statistics. OpenStax.

You did a very nice job of emphasizing that there is a difference between correlation and causation. That is extremely important because it is important for the “consumer of the research” to know what the results of scientific studies and undertakings mean and what the results do not mean.

How many of you have ever watched a television newscast where the reporter / anchor says something like:

“A new study ** shows** that…”

or

“A new research report ** proves** that…”

These types of statements and assertions could not be more incorrect and could not be more absurd / ridiculous. 😉

Quantitative research does not **PROVE** anything ( there *are* some people alive on this planet who probably disagree with me on this very strong statement here ).

So Brennaa as you said there is a big difference between coincidence and a true causal link between two quantitative variables.

If anyone cares to Google “correlation versus causation” you will likely find some funny stories and some humorous examples that highlight the big difference between coincidence and where there is perhaps a true, distinguishable, relevant cause and effect link…

Thanks Brennaa and be well ! Wonderful work and results in the course !!

Both correlation and regression analysis are applied in the determination of the relationship that exist between variables. In both cases, the variables need to be normally distributed and possess a normal distribution. However, there is the difference between the two statistical approaches (Kasuya, 2019). While correlation is only used to measure the association between two continuous variables, regression analysis is used to determine relationship between one dependent variable and one or more independent variables. Additionally, regression analysis is how we measure cause and affect relationships and determine if they are statistically sound or not. Correlation alone is not causation and that is why patterns and influence must be studied (Kasuya, 2019). Performing regression analysis in Body Mass Index (BMI) requires the consideration of different independent variables. Apart from diet and the rate of physical activities, another possible independent variable would be height of an individual or the study participants. Height is always considered in the computation of the BMI, therefore, it is one of the independent variables for the BMI. Also, the values of height are always continuous. However, data analyst need to ensure that there is a normal distribution.

Physical activities are known to reduce body mass index. In other words, continuous physical activities always aids in the breakdown of excessive body fast that contribute to the increase in BMI. Also, excessive or overeating and overconsumption of junk or fatty foods have been established as the major contributors to increase in BMI. Before undertaking correlation and regression analysis, there is always the need to undertake normality tests to ensure that both the dependent and independent variables meets the requirements for undertaking parametric tests or inferential statistical analysis.

Reference Kasuya, E. (2019). *On the use of r and r squared in correlation and regression* (Vol. 34, No. 1, pp. 235-236). Hoboken, USA: John Wiley & Sons, Inc. Retrieved from: https://esj-journals.onlinelibrary.wiley.com/doi/abs/10.1111/1440-1703.1011Links to an external site.

Congrats to all for surviving this course! Wasn’t this like trying to lean a new language in 8 weeks?

I found an excellent article in our library that compared different regression models for the best approach to predicting BMI: “Factors associated with overweight: are the conclusions influenced by choice of the regression method?” (Juvanhol et al., 2016). The bottom line was the authors recommend using a combination of different approaches, as these furnish complementary information to the multifactorial predictors of obesity. The article was a little over my head as it discussed gamma regression, which I couldn’t find in our textbook, and quantiles, which also is not in our text but seems a lot like quartiles. But thanks to this course, I was able to understand more of this article than I would have before this course.

In this article, BMI distribution percentiles is on the x-axis of the following charts. The along the y-axis were the values of the estimated coefficients for age, physical inactivity, years of night-shift work, BMI at age 20, domestic overload (cleaning/cooking/laundry factored by number of residents at home) and self-rated health. According to Juvanhol et al., (2016), these were the explanatory variables. This is still a little confusing to me, as Holmes et al. (2018) stated that a multivariate model or system is where more than one independent variable is used to predict an outcome, and there can only be one dependent variable, but unlimited independent variables. So why did the authors refer to age, etc., as explanatory variables, which would made them independent variables, but not put them on the x-axis?

Anyway, the independent variables are along the y-axis, and are shown in units of the values of the coefficients estimated. Coefficients provide an estimate of the impact of a unit change in the independent variable on the dependent variable (Holmes et al., 2018). The coefficient we use in a linear regression is the slope, or the rise over the run. However, this week we learned about another kind of coefficient, the coefficient of determination which is the explained variation over the total variation (Chamberlain University, 2021). I am not sure which coefficient the authors are referring to in the article.

The grey shaded areas around each line show the 95% confidence interval for the quantile estimates. It is interesting to note the narrowness of the spread of the confidence interval around the line in the “Age” graph and the “BMI at age 20” graphs in comparison to the other four graphs even though they are all at the 95% confidence level. We all know now that a narrow confidence interval is preferred over a wide one (Holmes et al., 2018).

To answer the final question, which statistic would show the value of that regression line in understanding BMI, I’d give more weight (pardon the pun) to the statistics of “Age” and “BMI at age 20” due to the narrowness of the confidence intervals, but also interesting is the way the “Years worked at night” regression line jumps at about the 80^{th} quantile showing a suddenly stronger association in the upper quantiles. That would be an interesting area to investigate.

Elaine

Chamberlain University. (2021). MATH225. *Week 8 Slide Deck *[Online lesson]. Downers Grove, IL: Adtalem.

Holmes, A., Illowsky, B., & Dean, S. (2018). *Introductory business statistics*. OpenStax.

Juvanhol, L.L., Lana, R.M., Cabrelli, R., Bastos, L.S., Nobre, A.A., Rotenberg, L., Griep, R.H. (2016). Factors associated with overweight: are the conclusions influenced by choice of the regression method? *BMC Public Health 16*, 642. http://doi.org/10.1186/s12889-016-3340-2Links to an external site.

You mentioned one thing in your Post that is certainly worth reiterating and emphasizing. For many class members, the reason for taking this course and the reason for working hard in this course and to be motivated to learn is so that when you go off in the future and read journal articles and research reports and books for other courses that you take, then hopefully you will understand much more of what you read than what you would have otherwise if you had never taken this course. That sentiment perfectly encapsulates why many ( sometimes most ) of the class members are taking this course here. 😉

Secondly Elaine I want to pick up on something else that you wrote in your Post about “statistics.” A loose general definition of a “statistic” is “any number calculated from sample data.”

So in our course as a whole, some of our most prominent statistics have been the sample mean xbar, the sample proportion phat, the sample standard deviation *s* , and the sample median. I can think of at least four statistics right away in the context of Week 8 and in the context of linear correlation and simple linear regression. Can you all, the class members, collectively try to list those 4 statistics that I have in my mind here ?? 🙂

Thanks Elaine and Enjoy Week 8 and Learn a lot !

Elaine and Everyone please remember that this course ends on Sat Feb 27 and that March April 2021 courses begin on Sun Feb 28.

Thanks Everyone and please complete your Jan Feb 2021 Course and Instructor Evaluations right away ! I really appreciate that !!

No password is needed for the Final Exam, and it is really important to complete the Final Exam first attempt as early in Week 8 as possible in case you have any problems with electricity or internet access later in the Week 8 or if say CANVAS or Knewton were to have unexpected down time later in the Week 8…

Be well Friends and Terrific work and results in the course ! Please keep it up too !

Yes, this was learning a new language and would like to thank you again for assisting with posting graphs to our discussion piece. Your posts are always so informative and I always enjoy reading them. BMI is such a huge factor in our lives. I have noticed that obesity is starting at childhood and continues through adult live and that these obese children come from parents that are obese themselves. I know Covid and our State shut down has not been kind to me as I have gained 15 pounds since the start of all of this due to inactivity. Our state has been shut down for a year now. No going out to eat, limited shopping, parks being closed, beaches, etc. Little to nothing to do. I will be glad when life can get back to a some what normal. Take care and good luck on the final.

This is insightful Elaine, regression models are necessary in the prediction of Body Mass Index (BMI) given the independent variables. From the article, the independent variables applied in the determination of the BMI were appropriate given that they were normally distributed and were continuous variables. In the determination of the regression models, it is always necessary to consider different factors. From the article chosen, the bottom line was the authors recommend using a combination of different approaches, as these furnish complementary information to the multifactorial predictors of obesity (Juvanhol et al., 2016). The article chosen was detailed enough as it consisted of numerous approaches in the determination of regression analysis. Gamma regression is essential in understanding the relationship or the correlation that ought to be undertaken to enhance the acquisition of the outcomes. While undertaking regression analysis, there is always the need to ensure that both the dependent and independent variables achieve the normality required to ensure effective outcomes. Even though correlation is another statistical measure that is essential in the determination of the relationship between two variables, it was not deeply discussed in the above article. In other words, the author concentrated more on the regression analysis approaches. Some of the approaches applied in the article are simple and easy to understand.

The graphical analysis applied in the article makes it easier for the readers to understand the independent and dependent variables, the independent variables are along the y-axis, and are shown in units of the values of the coefficients estimated. Coefficients provide an estimate of the impact of a unit change in the independent variable on the dependent variable. The graphical analysis provides theoretical perspectives that can be applied in the real data analysis processes to enhance the accuracy in regression analysis process.

### Reference

Juvanhol, L.L., Lana, R.M., Cabrelli, R., Bastos, L.S., Nobre, A.A., Rotenberg, L., Griep, R.H. (2016). Factors associated with overweight: are the conclusions influenced by choice of the regression method? BMC Public Health 16, 642. http://doi.org/10.1186/s12889-016-3340-