Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: KAN Model Does Not Support Tensor Input with More Than Two Dimensions #204

Open
linkedlist771 opened this issue May 16, 2024 · 5 comments

Comments

@linkedlist771
Copy link

Description

The KAN (Kolmogorov Activation Network) model from the pykan library currently only supports two-dimensional input tensors (batch_size x hid_dim). A RuntimeError is raised when attempting to use a three-dimensional tensor (batch_size x atomic_number x hid_dim) as input.

Code Snippet

from kan import KAN
import torch

hid_dim = 256
atomic_number = 42
batch_size = 60

input_tensor = torch.randn(batch_size, atomic_number, hid_dim)
model = KAN(width=[hid_dim, 1],
            grid=5,
            k=3,
            seed=0)
model(input_tensor).shape

Error Message

RuntimeError: einsum(): the number of subscripts in the equation (2) does not match the number of dimensions (3) for operand 0 and no ellipsis was given.

Explanation

This limitation prevents the use of the KAN model in scenarios where input tensors exceed two dimensions, such as in certain natural language processing tasks where dimensions might include (batch_size x sequence_length x hid_dim).

Motivation for Using KAN Instead of MLP

The motivation for replacing MLP with KAN as the dimension reduction output module is rooted in KAN’s ability to offer more efficient computation and potentially better performance in capturing non-linear interactions between features. MLP, while versatile, can sometimes be computationally expensive and less effective at handling complex feature interactions in high-dimensional spaces. KAN's structured approach provides a promising alternative that could enhance model efficiency and effectiveness in many applications, particularly where dimensionality reduction is crucial. Additionally, I have observed that under the same network structure, the parameter count for KAN is significantly higher than that for MLP, indicating a more complex model capacity.

Suggestion

It would be beneficial to update the KAN model implementation to support input tensors with arbitrary dimensions. This adjustment could mimic the functionality of "pointwise feed-forward networks" or "position-wise feed-forward networks" used in architectures like Transformers, which apply the same MLP (multi-layer perceptron) independently to each position in the hidden dimension.

@didi226
Copy link

didi226 commented May 16, 2024

I meet the same problem

@c-pupil
Copy link

c-pupil commented May 16, 2024

Additionally, I have observed that under the same network structure, the parameter count for KAN is significantly higher than that for MLP, indicating a more complex model capacity.

I also encountered a similar problem.I did an experiment, ,mlp and kan set the same hidden layers number, if input tensors is [64,28x28],the memory usage of mlp and kan is similar,but if input tensors is [36848,28x28],The memory usage of mlp and kan
is huge different.and significantly higher than that for MLP
Do you know why?Looking forward to your answers.

@linkedlist771
Copy link
Author

linkedlist771 commented May 16, 2024

Additionally, I have observed that under the same network structure, the parameter count for KAN is significantly higher than that for MLP, indicating a more complex model capacity.

I also encountered a similar problem.I did an experiment, ,mlp and kan set the same hidden layers number, if input tensors is [64,28x28],the memory usage of mlp and kan is similar,but if input tensors is [36848,28x28],The memory usage of mlp and kan is huge different.and significantly higher than that for MLP Do you know why?Looking forward to your answers.

I think this is related with the number of parameters, in KAN each neuron is a function, which is a numerical value in MLP, here is my code.

from kan import KAN
import torch


model = KAN(width=[256 , 1],
            grid=5, 
            k=3, 
            seed=0
            )
total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Total parameters: {total_params}")

model = torch.nn.Linear(256, 1)
total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Total parameters: {total_params}")

Output:

Total parameters: 3585
Total parameters: 257

If you have different findings, keep me updated.

@c-pupil
Copy link

c-pupil commented May 17, 2024

@linkedlist771

  1. Thanks for your answer. Your experiment verified the parameters relationship between the mlp and kan is about ten times.

  2. If the neurons of both are the same,such as the input layer size and out layer size is 28x28 and 3 respectively.,hidden layer is [256,256,256,256], the neurons number should is 28x28x256x256x256x256x3, But why when the input tensor is [64,28x28] ,the memory usage of mlp and kan is similar,Shouldn’t it be roughly ten times the same? Only when the input tensor is [36848,28*28] ,the memory usage of mlp and kan is about ten times,the ratio is similar to that given by 256 neurons in your experiment

  3. Above test executed on a gpu. Thanks again for your answer,

@linkedlist771
Copy link
Author

@linkedlist771

  1. Thanks for your answer. Your experiment verified the parameters relationship between the mlp and kan is about ten times.
  2. If the neurons of both are the same,such as the input layer size and out layer size is 28x28 and 3 respectively.,hidden layer is [256,256,256,256], the neurons number should is 28x28x256x256x256x256x3, But why when the input tensor is [64,28x28] ,the memory usage of mlp and kan is similar,Shouldn’t it be roughly ten times the same? Only when the input tensor is [36848,28*28] ,the memory usage of mlp and kan is about ten times,the ratio is similar to that given by 256 neurons in your experiment
  3. Above test executed on a gpu. Thanks again for your answer,

In this case, sophisticated benchmark experiments should be designed to validate this approach. While not particularly challenging, the process can be monotonous. You need to profile the resource consumption of each configuration of the model, analyze the storage requirements for the model's parameters, gradients, optimizer, and input datasets...

My previous code was just a simple proof of concept. I might work on this when I have some free time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants