Feature Request: KAN Model Does Not Support Tensor Input with More Than Two Dimensions #204

linkedlist771 · 2024-05-16T08:09:43Z

Description

The KAN (Kolmogorov Activation Network) model from the pykan library currently only supports two-dimensional input tensors (batch_size x hid_dim). A RuntimeError is raised when attempting to use a three-dimensional tensor (batch_size x atomic_number x hid_dim) as input.

Code Snippet

from kan import KAN
import torch

hid_dim = 256
atomic_number = 42
batch_size = 60

input_tensor = torch.randn(batch_size, atomic_number, hid_dim)
model = KAN(width=[hid_dim, 1],
            grid=5,
            k=3,
            seed=0)
model(input_tensor).shape

Error Message

RuntimeError: einsum(): the number of subscripts in the equation (2) does not match the number of dimensions (3) for operand 0 and no ellipsis was given.

Explanation

This limitation prevents the use of the KAN model in scenarios where input tensors exceed two dimensions, such as in certain natural language processing tasks where dimensions might include (batch_size x sequence_length x hid_dim).

Motivation for Using KAN Instead of MLP

The motivation for replacing MLP with KAN as the dimension reduction output module is rooted in KAN’s ability to offer more efficient computation and potentially better performance in capturing non-linear interactions between features. MLP, while versatile, can sometimes be computationally expensive and less effective at handling complex feature interactions in high-dimensional spaces. KAN's structured approach provides a promising alternative that could enhance model efficiency and effectiveness in many applications, particularly where dimensionality reduction is crucial. Additionally, I have observed that under the same network structure, the parameter count for KAN is significantly higher than that for MLP, indicating a more complex model capacity.

Suggestion

It would be beneficial to update the KAN model implementation to support input tensors with arbitrary dimensions. This adjustment could mimic the functionality of "pointwise feed-forward networks" or "position-wise feed-forward networks" used in architectures like Transformers, which apply the same MLP (multi-layer perceptron) independently to each position in the hidden dimension.

The text was updated successfully, but these errors were encountered:

didi226 · 2024-05-16T08:41:45Z

I meet the same problem

c-pupil · 2024-05-16T10:02:12Z

Additionally, I have observed that under the same network structure, the parameter count for KAN is significantly higher than that for MLP, indicating a more complex model capacity.

I also encountered a similar problem.I did an experiment, ,mlp and kan set the same hidden layers number, if input tensors is [64,28x28],the memory usage of mlp and kan is similar，but if input tensors is [36848,28x28]，The memory usage of mlp and kan
is huge different.and significantly higher than that for MLP
Do you know why?Looking forward to your answers.

linkedlist771 · 2024-05-16T10:35:13Z

Additionally, I have observed that under the same network structure, the parameter count for KAN is significantly higher than that for MLP, indicating a more complex model capacity.

I also encountered a similar problem.I did an experiment, ,mlp and kan set the same hidden layers number, if input tensors is [64,28x28],the memory usage of mlp and kan is similar，but if input tensors is [36848,28x28]，The memory usage of mlp and kan is huge different.and significantly higher than that for MLP Do you know why?Looking forward to your answers.

I think this is related with the number of parameters, in KAN each neuron is a function, which is a numerical value in MLP, here is my code.

from kan import KAN
import torch


model = KAN(width=[256 , 1],
            grid=5, 
            k=3, 
            seed=0
            )
total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Total parameters: {total_params}")

model = torch.nn.Linear(256, 1)
total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Total parameters: {total_params}")

Output:

Total parameters: 3585
Total parameters: 257

If you have different findings, keep me updated.

c-pupil · 2024-05-17T07:34:21Z

@linkedlist771

Thanks for your answer. Your experiment verified the parameters relationship between the mlp and kan is about ten times.
If the neurons of both are the same,such as the input layer size and out layer size is 28x28 and 3 respectively.,hidden layer is [256,256,256,256], the neurons number should is 28x28x256x256x256x256x3, But why when the input tensor is [64,28x28] ,the memory usage of mlp and kan is similar,Shouldn’t it be roughly ten times the same? Only when the input tensor is [36848,28*28] ,the memory usage of mlp and kan is about ten times,the ratio is similar to that given by 256 neurons in your experiment
Above test executed on a gpu. Thanks again for your answer，

linkedlist771 · 2024-05-17T07:48:16Z

@linkedlist771

Thanks for your answer. Your experiment verified the parameters relationship between the mlp and kan is about ten times.

If the neurons of both are the same,such as the input layer size and out layer size is 28x28 and 3 respectively.,hidden layer is [256,256,256,256], the neurons number should is 28x28x256x256x256x256x3, But why when the input tensor is [64,28x28] ,the memory usage of mlp and kan is similar,Shouldn’t it be roughly ten times the same? Only when the input tensor is [36848,28*28] ,the memory usage of mlp and kan is about ten times,the ratio is similar to that given by 256 neurons in your experiment

Above test executed on a gpu. Thanks again for your answer，

In this case, sophisticated benchmark experiments should be designed to validate this approach. While not particularly challenging, the process can be monotonous. You need to profile the resource consumption of each configuration of the model, analyze the storage requirements for the model's parameters, gradients, optimizer, and input datasets...

My previous code was just a simple proof of concept. I might work on this when I have some free time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: KAN Model Does Not Support Tensor Input with More Than Two Dimensions #204

Feature Request: KAN Model Does Not Support Tensor Input with More Than Two Dimensions #204

linkedlist771 commented May 16, 2024

didi226 commented May 16, 2024

c-pupil commented May 16, 2024

linkedlist771 commented May 16, 2024 •

edited

c-pupil commented May 17, 2024

linkedlist771 commented May 17, 2024

Feature Request: KAN Model Does Not Support Tensor Input with More Than Two Dimensions #204

Feature Request: KAN Model Does Not Support Tensor Input with More Than Two Dimensions #204

Comments

linkedlist771 commented May 16, 2024

Description

Code Snippet

Error Message

Explanation

Motivation for Using KAN Instead of MLP

Suggestion

didi226 commented May 16, 2024

c-pupil commented May 16, 2024

linkedlist771 commented May 16, 2024 • edited

c-pupil commented May 17, 2024

linkedlist771 commented May 17, 2024

linkedlist771 commented May 16, 2024 •

edited