Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompute weights for parallel::CellWeights. #16537

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

marcfehling
Copy link
Member

@marcfehling marcfehling commented Jan 25, 2024

In most cases of weighted load balancing, we only require information about the number of degrees of freedom. Currently, we calculate these weights for each cell individually. This is wasteful, as cells with the same finite element assigned will ultimately get the same weight.

This PR proposes to introduce a cache option: we compute the weights for each finite element in advance, and use these values as weights for each cell.


Currently, the interface of the WeightingFunction type uses both a cell_iterator and a finite element. We might discuss if we want to keep that interface for the future, or if we would like to simplify it to only take a finite element as an argument. The only possible use cases for this class at the moment are those in which weights are only determined via finite elements, so I would favor to keep it simple. Please tell me if you use this class for other purposes, or if you see other use cases.

I moved the selection of possible weighting functions to the top of the h and cc files. We might also want to move them into a separate namespace.

Copy link
Member

@bangerth bangerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I even look at this: What is the problem you're trying to solve here? Do you actually see the computation of weights in profiles? It's hard to imagine that computing small powers of integers is ever going to be a worthwhile target for caching.

@marcfehling
Copy link
Member Author

marcfehling commented Jan 26, 2024

When writing the Stokes code matrixfree with two DoFHandlers, I had to write a separate function for weighted load balancing as parallel::CellWeights only supports one DoFHandler. For some reason, it didn't come to my mind that I could simply instantiate one object per DoFHandler, but I noticed that all the information we need really is in the FiniteElement, and not in the DoFCellAccessor. We can precompute the weights for each FiniteElement while registering the weighting callback function, and then simply access them via FE indices for each cell.

With this PR, I wanted to make my observation/implementation available in deal.II. I did not identify it as a bottleneck. I just recently stumbled over the weighting part in my code again and thought it would be handy to have it in the library as well.

In my opinion, I over-engineered the class in #7264, #8687 (comment). The only application it was built for was to weight cells by the number of degrees of freedom. There was never a reason to use a DoFCellAccessor to access properties, and I still can't think of one now. In the long term, I would like to change and simplify the interface of the WeightingFunction accordingly to only take a FiniteElement. If a user needs information on a DoFCellAccessor, they can still build their own function (like I did for the Stokes code) and attach it to the signal.

@kronbichler
Copy link
Member

Even though my experience is different from

In most cases of weighted load balancing, we only require information about the number of degrees of freedom.

because most applications I've dealt with have specific requirements (think of local time stepping, cells which do more work than others for other reasons than for different numbers of DoFs, such as CutFEM algorithms, higher cost of cells with hanging nodes compared to unconstrained cells), I think the present class was primarily designed for use in hp methods where this might be justified. In fact, all applications I've worked with manually connect to the signals, where the number of DoFs is part of the metric, but the actual cell is also needed to collect the information. Therefore, I would support the interface @marcfehling thinks is most useful, as he has most experience with the present class.

Now I should of course question whether it would make sense for some of my applications to use CellWeights. The question would be if my use cases fit into the class or not. Opinions, @marcfehling?

@marcfehling
Copy link
Member Author

marcfehling commented Jan 31, 2024

I think the present class was primarily designed for use in hp methods

Yes, exactly that's the motivation behind it. We kept an option open to potentially extend the class to use information from cell_iterators, but as this class was never really designed for this case, I would rather like to make the purpose of the class clearer, limit its functionality, and optimize for it. Keeping it simple for vanilla hp applications. Thus, in a follow-up PR I would like to deprecate the old interface, and adjust the documentation. Maybe we should even move it in the hp namespace. What's your opinion?

Users which need more sophisticated weighting functions can still write their own function and connect it to the signal, as you already pointed out.

@kronbichler Do you think your uses cases can be generalized to become part of the selection of weighting functions? If so, I would be happy to leave the interface as is.

template <int dim, int spacedim>
typename CellWeights<dim, spacedim>::WeightingFunction
CellWeights<dim, spacedim>::constant_weighting(const unsigned int factor)
{
return [factor](const typename DoFHandler<dim, spacedim>::cell_iterator &,
const FiniteElement<dim, spacedim> &) -> unsigned int {
return factor;
};
}
template <int dim, int spacedim>
typename CellWeights<dim, spacedim>::WeightingFunction
CellWeights<dim, spacedim>::ndofs_weighting(
const std::pair<float, float> &coefficients)
{
return [coefficients](
const typename DoFHandler<dim, spacedim>::cell_iterator &,
const FiniteElement<dim, spacedim> &future_fe) -> unsigned int {
const float result =
std::trunc(coefficients.first *
std::pow(future_fe.n_dofs_per_cell(), coefficients.second));
Assert(result >= 0. &&
result <=
static_cast<float>(std::numeric_limits<unsigned int>::max()),
ExcMessage(
"Cannot cast determined weight for this cell to unsigned int!"));
return static_cast<unsigned int>(result);
};
}
template <int dim, int spacedim>
typename CellWeights<dim, spacedim>::WeightingFunction
CellWeights<dim, spacedim>::ndofs_weighting(
const std::vector<std::pair<float, float>> &coefficients)
{
return [coefficients](
const typename DoFHandler<dim, spacedim>::cell_iterator &,
const FiniteElement<dim, spacedim> &future_fe) -> unsigned int {
float result = 0;
for (const auto &pair : coefficients)
result +=
pair.first * std::pow(future_fe.n_dofs_per_cell(), pair.second);
result = std::trunc(result);
Assert(result >= 0. &&
result <=
static_cast<float>(std::numeric_limits<unsigned int>::max()),
ExcMessage(
"Cannot cast determined weight for this cell to unsigned int!"));
return static_cast<unsigned int>(result);
};
}

@bangerth bangerth mentioned this pull request Jan 31, 2024
Copy link
Member

@bangerth bangerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're currently keeping the cache in the lambda function, but I think that's asking for trouble. It's not unreasonable if people use an idiom like

  FECollection fe;
  fe.push_back(FE_Q(1));
  dof_handler.distribute_dofs(fe);
  ...solve with a Q1 element...

  fe.push_back(FE_Q(2));
  ...refine mesh., assign active_fe_indices...
  dof_handler.distribute_dofs(fe); // now uses a different FECollection

In the best case, you'll get a segfault here. You would need to find a way to invalidate the cache whenever distribute_dofs() is called, and to re-build it upon first use.

include/deal.II/distributed/cell_weights.h Outdated Show resolved Hide resolved
include/deal.II/distributed/cell_weights.h Outdated Show resolved Hide resolved
source/distributed/cell_weights.cc Outdated Show resolved Hide resolved
source/distributed/cell_weights.cc Outdated Show resolved Hide resolved
@marcfehling
Copy link
Member Author

marcfehling commented Apr 11, 2024

You would need to find a way to invalidate the cache whenever distribute_dofs() is called, and to re-build it upon first use.

For re-building the cache in this case, I would need to introduce a signal in the DoFHandler::distribute_dofs() or the FECollection::push_back() function, to which the CellWeights object connects.

Instead, I opted for comparing the sizes of the weight cache and the FECollection in use. If they don't match, an assertion is triggered. Do you think this is enough?

// Check if cache still matches FECollection.
AssertDimension(dof_handler.get_fe_collection().size(),
weight_cache.size());

In addition, I can extend the check to compare each element of the FECollection. I would then also copy-capture the FECollection for validation purposes. I proposed this in a separate commit -- if we don't like it I can remove it quickly.

@marcfehling
Copy link
Member Author

Would you be willing to give this one more look? :-)

Copy link
Member

@bangerth bangerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scheme you currently have is too expensive. Copying a finite element is expensive, copying a whole collection even more.

Why don't you move the creation of the cache into the lambda function that you attach to the signal? Then you can create entries in the cache on the fly, and you can also invalidate it as necessary. Perhaps something like this:

template <int dim, int spacedim>
  std::function<unsigned int(
    const typename dealii::Triangulation<dim, spacedim>::cell_iterator &cell,
    const CellStatus                                                    status)>
  CellWeights<dim, spacedim>::make_weighting_callback_with_cache(
    const DoFHandler<dim, spacedim> &dof_handler,
    const WeightingFunction         &weighting_function)
  {
      return [&dof_handler,
            tria,
            weight_cache = std::map<unsigned int,unsigned int>()](
             const typename dealii::Triangulation<dim, spacedim>::cell_iterator
                             &cell,
             const CellStatus status) mutable -> unsigned int {
      return CellWeights<dim, spacedim>::weighting_callback_with_cache(
        cell, status, dof_handler, *tria, fe_collection, weight_cache);
    };
  }

and then in weighting_callback_with_cache you would have this at the bottom:

  if (weighting_callback_with_cache.find(fe_index) == end iterator)
   {
       compute weight for this fe_index
       add entry into the map
  }
  return weighting_callback_with_cache[fe_index];

This does not solve the problem of invalidating the cache. But I think it is not unreasonable to assume that the weight for a given fe_index should stay constant.

@marcfehling
Copy link
Member Author

I realize that the choice of the word 'cache' is probably not right in this context, but rather 'precomputed weights'. I can adjust the documentation for this purpose if you like. I would call the function make_weighting_callback_and_precompute_weights instead of make_weighting_callback_with_cache.

The scheme you currently have is too expensive. Copying a finite element is expensive, copying a whole collection even more.

The sole purpose of storing a copy of the FE collection is to validate the precomputed weights: If the FE used to compute the weight is still the same one as in the DoFHandler, then the weight is valid. I believe that comparing the complete FE is more defensive than just the FE index.

I agree that copying elements is too expensive if we would do it with every call of the callback function, but luckily we only capture once on the outer lambda function which will be connected to the signal. The actual callback function then uses *tria and fe_collection as snapshots, so we can solve the validation problem like this in a rather cheap way.

Why don't you move the creation of the cache into the lambda function that you attach to the signal?

That is also a good idea. My reasoning to compute the weights in the beginning is that in basically all user codes a FE collection will be set up once at startup, and then remains untouched until the end of the program. But I'm open to compute them somewhere else.


In the last commit, I moved the assertion to the relevant spot. Also, I limited the check to the relevant FE element, not the complete FE collection.

@marcfehling
Copy link
Member Author

marcfehling commented May 24, 2024

Or let's try a different, maybe less confusing perspective: Why don't we offer the user two different ways, and design the interface openly around it.

Either the user connects a weighting function the old way:

WeightingFunction my_weighting_function = ...;
CellWeights       my_cell_weights(dof_handler, my_weighting_function);

Or they can actively precompute all these weights, and then use them for the weighting via separate constructor.

WeightingFunction   my_weighting_function = ...;
std::vector<double> my_weights = precompute_weights(fe_collection, my_weighting_function);
CellWeights         my_cell_weights(dof_handler, my_weights);

Then it is quite obvious to the user what the CellWeights object uses to determine the weight. And this is closer to what I wanted to suggest initially.

@marcfehling marcfehling changed the title Enable cache for parallel::CellWeights. Precompute weights for parallel::CellWeights. Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants