-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precompute weights for parallel::CellWeights. #16537
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I even look at this: What is the problem you're trying to solve here? Do you actually see the computation of weights in profiles? It's hard to imagine that computing small powers of integers is ever going to be a worthwhile target for caching.
When writing the Stokes code matrixfree with two DoFHandlers, I had to write a separate function for weighted load balancing as With this PR, I wanted to make my observation/implementation available in deal.II. I did not identify it as a bottleneck. I just recently stumbled over the weighting part in my code again and thought it would be handy to have it in the library as well. In my opinion, I over-engineered the class in #7264, #8687 (comment). The only application it was built for was to weight cells by the number of degrees of freedom. There was never a reason to use a |
Even though my experience is different from
because most applications I've dealt with have specific requirements (think of local time stepping, cells which do more work than others for other reasons than for different numbers of DoFs, such as CutFEM algorithms, higher cost of cells with hanging nodes compared to unconstrained cells), I think the present class was primarily designed for use in Now I should of course question whether it would make sense for some of my applications to use |
Yes, exactly that's the motivation behind it. We kept an option open to potentially extend the class to use information from Users which need more sophisticated weighting functions can still write their own function and connect it to the signal, as you already pointed out. @kronbichler Do you think your uses cases can be generalized to become part of the selection of weighting functions? If so, I would be happy to leave the interface as is. dealii/source/distributed/cell_weights.cc Lines 64 to 122 in 2aa0f75
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're currently keeping the cache in the lambda function, but I think that's asking for trouble. It's not unreasonable if people use an idiom like
FECollection fe;
fe.push_back(FE_Q(1));
dof_handler.distribute_dofs(fe);
...solve with a Q1 element...
fe.push_back(FE_Q(2));
...refine mesh., assign active_fe_indices...
dof_handler.distribute_dofs(fe); // now uses a different FECollection
In the best case, you'll get a segfault here. You would need to find a way to invalidate the cache whenever distribute_dofs()
is called, and to re-build it upon first use.
For re-building the cache in this case, I would need to introduce a signal in the Instead, I opted for comparing the sizes of the weight cache and the FECollection in use. If they don't match, an assertion is triggered. Do you think this is enough? dealii/source/distributed/cell_weights.cc Lines 274 to 276 in 842ce7e
In addition, I can extend the check to compare each element of the FECollection. I would then also copy-capture the FECollection for validation purposes. I proposed this in a separate commit -- if we don't like it I can remove it quickly. |
fab350d
to
842ce7e
Compare
Would you be willing to give this one more look? :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scheme you currently have is too expensive. Copying a finite element is expensive, copying a whole collection even more.
Why don't you move the creation of the cache into the lambda function that you attach to the signal? Then you can create entries in the cache on the fly, and you can also invalidate it as necessary. Perhaps something like this:
template <int dim, int spacedim>
 std::function<unsigned int(
 const typename dealii::Triangulation<dim, spacedim>::cell_iterator &cell,
 const CellStatus status)>
 CellWeights<dim, spacedim>::make_weighting_callback_with_cache(
 const DoFHandler<dim, spacedim> &dof_handler,
 const WeightingFunction &weighting_function)
 {
 return [&dof_handler,
 tria,
 weight_cache = std::map<unsigned int,unsigned int>()](
 const typename dealii::Triangulation<dim, spacedim>::cell_iterator
 &cell,
 const CellStatus status) mutable -> unsigned int {
 return CellWeights<dim, spacedim>::weighting_callback_with_cache(
 cell, status, dof_handler, *tria, fe_collection, weight_cache);
 };
 }
and then in weighting_callback_with_cache
you would have this at the bottom:
if (weighting_callback_with_cache.find(fe_index) == end iterator)
{
compute weight for this fe_index
add entry into the map
}
return weighting_callback_with_cache[fe_index];
This does not solve the problem of invalidating the cache. But I think it is not unreasonable to assume that the weight for a given fe_index
should stay constant.
I realize that the choice of the word 'cache' is probably not right in this context, but rather 'precomputed weights'. I can adjust the documentation for this purpose if you like. I would call the function
The sole purpose of storing a copy of the FE collection is to validate the precomputed weights: If the FE used to compute the weight is still the same one as in the DoFHandler, then the weight is valid. I believe that comparing the complete FE is more defensive than just the FE index. I agree that copying elements is too expensive if we would do it with every call of the callback function, but luckily we only capture once on the outer lambda function which will be connected to the signal. The actual callback function then uses
That is also a good idea. My reasoning to compute the weights in the beginning is that in basically all user codes a FE collection will be set up once at startup, and then remains untouched until the end of the program. But I'm open to compute them somewhere else. In the last commit, I moved the assertion to the relevant spot. Also, I limited the check to the relevant FE element, not the complete FE collection. |
Or let's try a different, maybe less confusing perspective: Why don't we offer the user two different ways, and design the interface openly around it. Either the user connects a weighting function the old way:
Or they can actively precompute all these weights, and then use them for the weighting via separate constructor.
Then it is quite obvious to the user what the |
In most cases of weighted load balancing, we only require information about the number of degrees of freedom. Currently, we calculate these weights for each cell individually. This is wasteful, as cells with the same finite element assigned will ultimately get the same weight.
This PR proposes to introduce a cache option: we compute the weights for each finite element in advance, and use these values as weights for each cell.
Currently, the interface of the WeightingFunction type uses both a cell_iterator and a finite element. We might discuss if we want to keep that interface for the future, or if we would like to simplify it to only take a finite element as an argument. The only possible use cases for this class at the moment are those in which weights are only determined via finite elements, so I would favor to keep it simple. Please tell me if you use this class for other purposes, or if you see other use cases.
I moved the selection of possible weighting functions to the top of the h and cc files. We might also want to move them into a separate namespace.