Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize Function #16328

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

bergbauer
Copy link
Contributor

In the matrix-free context with vectorization over cells or quadrature points it is useful to use Function vectorized.

The problem is that this means also Vector has to work with VectorizedArray which is unfortunately not straightforward for the norm calculations.

Opening as a draft to get some input if this is useful at all before I put more work into it and write some tests.

Copy link
Member

@bangerth bangerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of churn, and a lot more instantiations. I would like to hear from others whether they think this is worth it? I'd think it shouldn't be that hard to find the places where you want to call function.value(...) with a vector, unroll the vector, and call function.vector_value() with the list of points instead?

@bergbauer
Copy link
Contributor Author

bergbauer commented Dec 8, 2023

Currently we have to write code like this

          for (const unsigned int q : evaluator_cell.quadrature_point_indices())
            {
              const auto q_point_real = evaluator_cell.real_point(q);
              VectorizedArray<Number> vectorized_rhs;
              for (unsigned int v = 0; v < n_lanes; ++v)
                {
                  Point<dim, double> point;
                  for (unsigned int d = 0; d < dim; ++d)
                    point[d] = q_point_real[d][v];

                  vectorized_rhs[v] = rhs_function->value(point);
                }

              evaluator_cell.submit_value(vectorized_rhs, q);
            }

With the proposed changes we are able to write

          for (const unsigned int q : evaluator_cell.quadrature_point_indices())
             evaluator_cell.submit_value(rhs_function->value(evaluator_cell.real_point(q)), q);

in the context of FEPointEvaluation with Number = VectorizedArray<double>.

With value_list we would still have to fill the std::vector<Point<dim, double>> from Point<dim, VectorizedArray<double>> because the data layout is different. @bangerth

@bangerth
Copy link
Member

bangerth commented Dec 8, 2023

Right. I understand what you want to do. I'm just wondering whether changing 755 lines of code is worth avoiding 15 lines of code or if perhaps implementing a helper function that can be used in all of the places where you need this functionality would be a better approach.

@bergbauer
Copy link
Contributor Author

I agree, allowing this functionality should be easier than the current changes. I have thought about this some more and I think the "problem" why this change is so involved is because we have Vector in the interface for vector_value and not std::vector like for vector_gradient. In quite some places throughout the library, member functions of Vector are used, so it is currently not considered to return a data container like std::vector.

What do you think about a new class VectorizedFunction with only value()/gradient()/hessian() that allows me to do what I want without reshuffling the data?

@bangerth
Copy link
Member

I think that does not really address the problem either. A base class VectorizedFunction means that we will eventually start to duplicate the entire set of functions we currently have. It also has the problem that you need a Function in some places, and a VectorizedFunction in other places.

The underlying question that I'd like to see some data on is that you are trying to optimize something -- you want to avoid shuffling around some data before you call the existing functions. But is the shuffling of data really a bottleneck? Is it worth optimizing this case? It is not worth making code more complex for optimizing something that is not a bottleneck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants