Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified Linear Regression to work on OLS, fixes #8847 #11311

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

smruthi-sumanth
Copy link

Describe your change:

  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms include at least one URL that points to Wikipedia or another similar explanation.
  • If this pull request resolves one or more open issues then the description above includes the issue number(s) with a closing keyword: "Fixes #ISSUE-NUMBER".

@algorithms-keeper algorithms-keeper bot added the awaiting reviews This PR is ready to be reviewed label Feb 27, 2024
Copy link
Contributor

@imSanko imSanko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@smruthi-sumanth smruthi-sumanth changed the title added bipolar step function Modified Linear Regression to work on OLS, fixes #8847 Mar 27, 2024
Copy link
Contributor

@tianyizheng02 tianyizheng02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! Apart from the changes that I've requested in my other comments, could you also rewrite the explanation at the top of the file to reflect your new implementation? In particular, make sure your new explanation does the following:

  1. Briefly explain what linear regression is
  2. Explain the OLS regression formula, $(\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X} \mathbf{y}$
  3. Note possible issues with your implementation (i.e. inefficiency, numerical instability, etc.)
  4. Cite a source that readers can refer to for further info (Wikipedia or similar is fine)

I wrote a similar explanation for weighted regression in machine_learning/local_weighted_learning/local_weighted_learning.py a while back. If you'd like, you're welcome to use that explanation as a reference (though your explanation doesn't need to be nearly as long).

"""Implement Linear regression over the dataset
:param data_x : contains our dataset
:param data_y : contains the output (result vector)
def run_linear_regression_ols(data_x, data_y):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def run_linear_regression_ols(data_x, data_y):
def ols_linear_regression(data_x: np.ndarray, data_y: np.ndarray) -> np.ndarray:
  1. Shortened the function name a bit (make sure you change the name elsewhere as well)
  2. Added type hints

:param data_x : contains our dataset
:param data_y : contains the output (result vector)
def run_linear_regression_ols(data_x, data_y):
"""Implement Linear regression using OLS over the dataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Implement Linear regression using OLS over the dataset
"""Implement OLS linear regression over a given dataset

Slight rewording for clarity

error = sum_of_square_error(data_x, data_y, len_data, theta)
print(f"At Iteration {i + 1} - Error is {error:.5f}")
# Use NumPy's built-in function to solve the linear regression problem
theta = np.linalg.inv(data_x.T.dot(data_x)).dot(data_x.T).dot(data_y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
theta = np.linalg.inv(data_x.T.dot(data_x)).dot(data_x.T).dot(data_y)
theta = np.linalg.inv(data_x.T @ data_x) @ data_x.T @ data_y

Instead of using .dot() for matrix multiplication, we can use numpy's @ operator, which does the same thing and is more readable

Comment on lines 38 to 39
:return : feature for line of best fit (Feature vector)
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OLS regression function needs doctests—make sure you verify the outputs of your tests with a calculator that can do linear regression (e.g., Wolfram Alpha)

@tianyizheng02 tianyizheng02 added awaiting changes A maintainer has requested changes to this PR require tests Tests [doctest/unittest/pytest] are required require type hints https://docs.python.org/3/library/typing.html require proper documentation Requested to write the documentation properly and removed awaiting reviews This PR is ready to be reviewed labels Jun 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting changes A maintainer has requested changes to this PR require proper documentation Requested to write the documentation properly require tests Tests [doctest/unittest/pytest] are required require type hints https://docs.python.org/3/library/typing.html
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants