[WIP] Optimal intercept initialization for simple objectives #10298
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
ref #9899
This PR modifies the intercept initialization for simple objectives (logistic, poisson, gamma, tweedie) to use their closed-form optimal solutions (as in: the number that minimizes the objective function) instead of a non-optimal one-step Newton.
For these objectives, the optimal intercept corresponds simply to the link function applied to the mean of the response variable. Since
base_score
already undergoes this transformation, the PR here just changes calculation to the mean of the response variable in those cases.For multi-target versions of these objectives, it sets them to zero instead as otherwise applying a common intercept might not make much sense for the given problem.
Note that there's still room for improvements:
Note1: I wasn't sure about how to calculate a weighted sample mean here (not familiar with GPU computing and the 'devices' logic). Would be helpful to have a
WeightedMean
function understats
if possible, to use in case there's sample weights.Note2: The compiler checks here don't like turning a
linalg::Tensor<T, 2>
intolinalg::Tensor<T, 1>
byreinterpret_cast
. I'm also not sure what would be the right way to do it without a data copy.Note3: I wasn't sure where to add tests for the changes here. For example, would be ideal to test that
binary:logistic
andbinary:logitraw
produce the same raw scores, but I'm not sure where's the right place to add such test.