Precompute weights for parallel::CellWeights. #16537

marcfehling · 2024-01-25T01:47:56Z

In most cases of weighted load balancing, we only require information about the number of degrees of freedom. Currently, we calculate these weights for each cell individually. This is wasteful, as cells with the same finite element assigned will ultimately get the same weight.

This PR proposes to introduce a cache option: we compute the weights for each finite element in advance, and use these values as weights for each cell.

Currently, the interface of the WeightingFunction type uses both a cell_iterator and a finite element. We might discuss if we want to keep that interface for the future, or if we would like to simplify it to only take a finite element as an argument. The only possible use cases for this class at the moment are those in which weights are only determined via finite elements, so I would favor to keep it simple. Please tell me if you use this class for other purposes, or if you see other use cases.

I moved the selection of possible weighting functions to the top of the h and cc files. We might also want to move them into a separate namespace.

bangerth

Before I even look at this: What is the problem you're trying to solve here? Do you actually see the computation of weights in profiles? It's hard to imagine that computing small powers of integers is ever going to be a worthwhile target for caching.

marcfehling · 2024-01-26T22:27:43Z

When writing the Stokes code matrixfree with two DoFHandlers, I had to write a separate function for weighted load balancing as parallel::CellWeights only supports one DoFHandler. For some reason, it didn't come to my mind that I could simply instantiate one object per DoFHandler, but I noticed that all the information we need really is in the FiniteElement, and not in the DoFCellAccessor. We can precompute the weights for each FiniteElement while registering the weighting callback function, and then simply access them via FE indices for each cell.

With this PR, I wanted to make my observation/implementation available in deal.II. I did not identify it as a bottleneck. I just recently stumbled over the weighting part in my code again and thought it would be handy to have it in the library as well.

In my opinion, I over-engineered the class in #7264, #8687 (comment). The only application it was built for was to weight cells by the number of degrees of freedom. There was never a reason to use a DoFCellAccessor to access properties, and I still can't think of one now. In the long term, I would like to change and simplify the interface of the WeightingFunction accordingly to only take a FiniteElement. If a user needs information on a DoFCellAccessor, they can still build their own function (like I did for the Stokes code) and attach it to the signal.

kronbichler · 2024-01-27T09:50:56Z

Even though my experience is different from

In most cases of weighted load balancing, we only require information about the number of degrees of freedom.

because most applications I've dealt with have specific requirements (think of local time stepping, cells which do more work than others for other reasons than for different numbers of DoFs, such as CutFEM algorithms, higher cost of cells with hanging nodes compared to unconstrained cells), I think the present class was primarily designed for use in hp methods where this might be justified. In fact, all applications I've worked with manually connect to the signals, where the number of DoFs is part of the metric, but the actual cell is also needed to collect the information. Therefore, I would support the interface @marcfehling thinks is most useful, as he has most experience with the present class.

Now I should of course question whether it would make sense for some of my applications to use CellWeights. The question would be if my use cases fit into the class or not. Opinions, @marcfehling?

marcfehling · 2024-01-31T00:01:32Z

I think the present class was primarily designed for use in hp methods

Yes, exactly that's the motivation behind it. We kept an option open to potentially extend the class to use information from cell_iterators, but as this class was never really designed for this case, I would rather like to make the purpose of the class clearer, limit its functionality, and optimize for it. Keeping it simple for vanilla hp applications. Thus, in a follow-up PR I would like to deprecate the old interface, and adjust the documentation. Maybe we should even move it in the hp namespace. What's your opinion?

Users which need more sophisticated weighting functions can still write their own function and connect it to the signal, as you already pointed out.

@kronbichler Do you think your uses cases can be generalized to become part of the selection of weighting functions? If so, I would be happy to leave the interface as is.

dealii/source/distributed/cell_weights.cc

Lines 64 to 122 in 2aa0f75

    
           template <int dim, int spacedim> 
        
           typename CellWeights<dim, spacedim>::WeightingFunction 
        
           CellWeights<dim, spacedim>::constant_weighting(const unsigned int factor) 
        
           { 
        
             return [factor](const typename DoFHandler<dim, spacedim>::cell_iterator &, 
        
                             const FiniteElement<dim, spacedim> &) -> unsigned int { 
        
               return factor; 
        
             }; 
        
           } 
        
           template <int dim, int spacedim> 
        
           typename CellWeights<dim, spacedim>::WeightingFunction 
        
           CellWeights<dim, spacedim>::ndofs_weighting( 
        
             const std::pair<float, float> &coefficients) 
        
           { 
        
             return [coefficients]( 
        
                      const typename DoFHandler<dim, spacedim>::cell_iterator &, 
        
                      const FiniteElement<dim, spacedim> &future_fe) -> unsigned int { 
        
               const float result = 
        
                 std::trunc(coefficients.first * 
        
                            std::pow(future_fe.n_dofs_per_cell(), coefficients.second)); 
        
               Assert(result >= 0. && 
        
                        result <= 
        
                          static_cast<float>(std::numeric_limits<unsigned int>::max()), 
        
                      ExcMessage( 
        
                        "Cannot cast determined weight for this cell to unsigned int!")); 
        
               return static_cast<unsigned int>(result); 
        
             }; 
        
           } 
        
           template <int dim, int spacedim> 
        
           typename CellWeights<dim, spacedim>::WeightingFunction 
        
           CellWeights<dim, spacedim>::ndofs_weighting( 
        
             const std::vector<std::pair<float, float>> &coefficients) 
        
           { 
        
             return [coefficients]( 
        
                      const typename DoFHandler<dim, spacedim>::cell_iterator &, 
        
                      const FiniteElement<dim, spacedim> &future_fe) -> unsigned int { 
        
               float result = 0; 
        
               for (const auto &pair : coefficients) 
        
                 result += 
        
                   pair.first * std::pow(future_fe.n_dofs_per_cell(), pair.second); 
        
               result = std::trunc(result); 
        
               Assert(result >= 0. && 
        
                        result <= 
        
                          static_cast<float>(std::numeric_limits<unsigned int>::max()), 
        
                      ExcMessage( 
        
                        "Cannot cast determined weight for this cell to unsigned int!")); 
        
               return static_cast<unsigned int>(result); 
        
             }; 
        
           }

bangerth

You're currently keeping the cache in the lambda function, but I think that's asking for trouble. It's not unreasonable if people use an idiom like

  FECollection fe;
  fe.push_back(FE_Q(1));
  dof_handler.distribute_dofs(fe);
  ...solve with a Q1 element...

  fe.push_back(FE_Q(2));
  ...refine mesh., assign active_fe_indices...
  dof_handler.distribute_dofs(fe); // now uses a different FECollection

In the best case, you'll get a segfault here. You would need to find a way to invalidate the cache whenever distribute_dofs() is called, and to re-build it upon first use.

include/deal.II/distributed/cell_weights.h

source/distributed/cell_weights.cc

marcfehling · 2024-04-11T19:08:03Z

You would need to find a way to invalidate the cache whenever distribute_dofs() is called, and to re-build it upon first use.

For re-building the cache in this case, I would need to introduce a signal in the DoFHandler::distribute_dofs() or the FECollection::push_back() function, to which the CellWeights object connects.

Instead, I opted for comparing the sizes of the weight cache and the FECollection in use. If they don't match, an assertion is triggered. Do you think this is enough?

dealii/source/distributed/cell_weights.cc

Lines 274 to 276 in 842ce7e

    
           // Check if cache still matches FECollection. 
        
           AssertDimension(dof_handler.get_fe_collection().size(), 
        
                           weight_cache.size());

In addition, I can extend the check to compare each element of the FECollection. I would then also copy-capture the FECollection for validation purposes. I proposed this in a separate commit -- if we don't like it I can remove it quickly.

marcfehling · 2024-05-23T15:14:13Z

Would you be willing to give this one more look? :-)

bangerth

The scheme you currently have is too expensive. Copying a finite element is expensive, copying a whole collection even more.

Why don't you move the creation of the cache into the lambda function that you attach to the signal? Then you can create entries in the cache on the fly, and you can also invalidate it as necessary. Perhaps something like this:

template <int dim, int spacedim>
  std::function<unsigned int(
    const typename dealii::Triangulation<dim, spacedim>::cell_iterator &cell,
    const CellStatus                                                    status)>
  CellWeights<dim, spacedim>::make_weighting_callback_with_cache(
    const DoFHandler<dim, spacedim> &dof_handler,
    const WeightingFunction         &weighting_function)
  {
      return [&dof_handler,
            tria,
            weight_cache = std::map<unsigned int,unsigned int>()](
             const typename dealii::Triangulation<dim, spacedim>::cell_iterator
                             &cell,
             const CellStatus status) mutable -> unsigned int {
      return CellWeights<dim, spacedim>::weighting_callback_with_cache(
        cell, status, dof_handler, *tria, fe_collection, weight_cache);
    };
  }

and then in weighting_callback_with_cache you would have this at the bottom:

  if (weighting_callback_with_cache.find(fe_index) == end iterator)
   {
       compute weight for this fe_index
       add entry into the map
  }
  return weighting_callback_with_cache[fe_index];

This does not solve the problem of invalidating the cache. But I think it is not unreasonable to assume that the weight for a given fe_index should stay constant.

marcfehling · 2024-05-24T11:22:05Z

I realize that the choice of the word 'cache' is probably not right in this context, but rather 'precomputed weights'. I can adjust the documentation for this purpose if you like. I would call the function make_weighting_callback_and_precompute_weights instead of make_weighting_callback_with_cache.

The scheme you currently have is too expensive. Copying a finite element is expensive, copying a whole collection even more.

The sole purpose of storing a copy of the FE collection is to validate the precomputed weights: If the FE used to compute the weight is still the same one as in the DoFHandler, then the weight is valid. I believe that comparing the complete FE is more defensive than just the FE index.

I agree that copying elements is too expensive if we would do it with every call of the callback function, but luckily we only capture once on the outer lambda function which will be connected to the signal. The actual callback function then uses *tria and fe_collection as snapshots, so we can solve the validation problem like this in a rather cheap way.

Why don't you move the creation of the cache into the lambda function that you attach to the signal?

That is also a good idea. My reasoning to compute the weights in the beginning is that in basically all user codes a FE collection will be set up once at startup, and then remains untouched until the end of the program. But I'm open to compute them somewhere else.

In the last commit, I moved the assertion to the relevant spot. Also, I limited the check to the relevant FE element, not the complete FE collection.

marcfehling · 2024-05-24T11:36:05Z

Or let's try a different, maybe less confusing perspective: Why don't we offer the user two different ways, and design the interface openly around it.

Either the user connects a weighting function the old way:

WeightingFunction my_weighting_function = ...;
CellWeights       my_cell_weights(dof_handler, my_weighting_function);

Or they can actively precompute all these weights, and then use them for the weighting via separate constructor.

WeightingFunction   my_weighting_function = ...;
std::vector<double> my_weights = precompute_weights(fe_collection, my_weighting_function);
CellWeights         my_cell_weights(dof_handler, my_weights);

Then it is quite obvious to the user what the CellWeights object uses to determine the weight. And this is closer to what I wanted to suggest initially.

…r provides precomputed weights.

marcfehling added hp adaptivity Parallel labels Jan 25, 2024

bangerth reviewed Jan 26, 2024

View reviewed changes

bangerth mentioned this pull request Jan 31, 2024

Minor doc update. #16573

Merged

bangerth reviewed Jan 31, 2024

View reviewed changes

marcfehling force-pushed the weight_cache branch from fab350d to 842ce7e Compare April 11, 2024 19:10

marcfehling force-pushed the weight_cache branch from 97cf793 to 1c5c5ee Compare May 23, 2024 15:12

marcfehling added 2 commits May 23, 2024 17:13

Enable cache for parallel::CellWeights.

48e00f7

Validate weight_cache with the complete fe_collection.

1c5c5ee

bangerth reviewed May 23, 2024

View reviewed changes

Moved and reduce assertion.

2f7887d

marcfehling force-pushed the weight_cache branch from 22e80a2 to 34d9d30 Compare May 24, 2024 12:58

marcfehling added 3 commits May 24, 2024 14:59

Redesign interface. Avoid mentioning cache.

34d9d30

Test what happens if we invalidate the precomputed weights.

3460686

Make it the user's responsibility: Skip the FECollection check if use…

fc342f1

…r provides precomputed weights.

marcfehling force-pushed the weight_cache branch from a0fdd23 to fc342f1 Compare June 4, 2024 10:19

marcfehling changed the title ~~Enable cache for parallel::CellWeights.~~ Precompute weights for parallel::CellWeights. Jun 4, 2024

marcfehling added the ready for review label Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precompute weights for parallel::CellWeights. #16537

Precompute weights for parallel::CellWeights. #16537

marcfehling commented Jan 25, 2024 •

edited

bangerth left a comment

marcfehling commented Jan 26, 2024 •

edited

kronbichler commented Jan 27, 2024

marcfehling commented Jan 31, 2024 •

edited

bangerth left a comment

marcfehling commented Apr 11, 2024 •

edited

marcfehling commented May 23, 2024

bangerth left a comment

marcfehling commented May 24, 2024

marcfehling commented May 24, 2024 •

edited

Precompute weights for parallel::CellWeights. #16537

Are you sure you want to change the base?

Precompute weights for parallel::CellWeights. #16537

Conversation

marcfehling commented Jan 25, 2024 • edited

bangerth left a comment

Choose a reason for hiding this comment

marcfehling commented Jan 26, 2024 • edited

kronbichler commented Jan 27, 2024

marcfehling commented Jan 31, 2024 • edited

bangerth left a comment

Choose a reason for hiding this comment

marcfehling commented Apr 11, 2024 • edited

marcfehling commented May 23, 2024

bangerth left a comment

Choose a reason for hiding this comment

marcfehling commented May 24, 2024

marcfehling commented May 24, 2024 • edited

marcfehling commented Jan 25, 2024 •

edited

marcfehling commented Jan 26, 2024 •

edited

marcfehling commented Jan 31, 2024 •

edited

marcfehling commented Apr 11, 2024 •

edited

marcfehling commented May 24, 2024 •

edited