PHD Discussions Logo

Ask, Learn and Accelerate in your PhD Research

Question Icon Post Your Answer

Question Icon

2 years ago in Data Science , Statistics By Abeden

How can coefficients in an equation be defined to maximize correlation with a set of numbers?

 I know techniques like least-squares regression minimize error, which often indirectly improves correlation. However, if my primary evaluation metric is the Pearson correlation coefficient (R) itself, should I be using a different optimization framework? I'm concerned about the potential divergence between minimizing error and directly maximizing R.

All Answers (1 Answers In All)

By Meghna R Answered 9 months ago

This is a nuanced and important distinction I've encountered in applied work. While least-squares is the workhorse for minimizing error, directly maximizing Pearson's R is a different optimization problem. Technically, R measures linear dependence on a standardized scale, insensitive to scale and offset. In practice, I would recommend starting with ordinary least squares (OLS) it's robust and often yields an excellent R. If you must directly maximize R, you would define it as your objective function and use a numerical optimizer (like gradient descent). However, be cautious: a model tuned only to R could have arbitrarily large errors if the slope isn't constrained, making it a potentially misleading exercise.

 

Your Answer