-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to /expansion #4728
base: master
Are you sure you want to change the base?
Improvements to /expansion #4728
Conversation
i left a comment on the issue: #4601 (comment) |
just to be super clear i dont have any problem with us updating this. but id rather it be something you opt into with a flag. maybe |
Ok, I've added dedupe option, which defaults to false and keeps the original behaviour. |
@@ -12,10 +12,11 @@ The expansion service wraps the `route`, `isochrone` and `sources_to_targets` se | |||
|
|||
Since this service wraps other services, the request format mostly follows the ones of the [route](../turn-by-turn/api-reference.md#inputs-of-a-route), [isochrone](../isochrone/api-reference.md#inputs-of-the-isochrone-service) and [matrix](../matrix/api-reference.md#inputs-of-the-matrix-service). Additionally, it accepts the following parameters: | |||
|
|||
| Parameter | Description | | |||
| Parameter | Description | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: probably dont need the whitespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I've overlooked this, it's fixed now
| `skip_opposites` (optional) | If set to `true` the output won't contain an edge's opposing edge. Opposing edges can be thought of as both directions of one road segment. Of the two, we discard the directional edge with higher cost and keep the one with less cost. Default false. | | ||
| `action` (required) | The service whose expansion should be tracked. Currently one of `route`, `isochrone` or `sources_to_targets`. | | ||
| `skip_opposites` (optional) | If set to `true` the output won't contain an edge's opposing edge. Opposing edges can be thought of as both directions of one road segment. Of the two, we discard the directional edge with higher cost and keep the one with less cost. Default false. | | ||
| `dedupe` (optional) | If set to `true`, the output will contain each edge only once, significantly reducing the response size. The expansion will keep track of already visited edges and override their properties, ensuring that only the newer one or the one with higher edge states is returned. Default false. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you very much for documenting!
| `dedupe` (optional) | If set to `true`, the output will contain each edge only once, significantly reducing the response size. The expansion will keep track of already visited edges and override their properties, ensuring that only the newer one or the one with higher edge states is returned. Default false. | | |
| `dedupe` (optional) | If set to `true`, the output will contain each edge only once, significantly reducing the response size. The expansion will keep track of already visited edges and override their properties, ensuring that only the newer one or the one with higher edge states is returned. Default `false`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it in skip_opposites too, for consistency
@@ -18,9 +18,10 @@ message Expansion { | |||
repeated uint32 costs = 1 [packed=true]; | |||
repeated uint32 durations = 2 [packed=true]; | |||
repeated uint32 distances = 3 [packed=true]; | |||
repeated EdgeStatus edge_status = 4; | |||
repeated EdgeStatus edge_status = 4 [packed=true]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is technically a breaking change but its a beta service so oh well i guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's time after this PR to take it out of BETA state?
repeated uint32 edge_status_flags = 7 [packed=true]; | ||
|
||
repeated Geometry geometries = 7; | ||
repeated Geometry geometries = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moving things around in proto is very much a breaking change again it doesnt matter because we call this service beta but we should try not to do this when we dont need to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right of course, but since we're breaking it anyway, I've moved it around ... well just for aesthetic reasons tbh.
src/thor/expansion_action.cc
Outdated
// unfortunately we have to call this before checking if we can skip | ||
// else the tile could change underneath us when we get the opposing | ||
auto shape = tile->edgeinfo(edge).shape(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're saying that because of the call to GetOpposingEdgeId
below? if so the simple solution would be to, inside of the skip_opps
if
block to add auto opp_tile = tile;
and use opp_tile
in the call to GetOpposingEdgeId
. if i misunderstood this comment let me know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is original comment from @nilsnolde actually, I haven't touch it, it just got marked by diff as I've clean up some unused code around those lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i mean would be good to move the grabbing of the shape below the if
though in case that method bails before using it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I've moved it and removed the original comment
// check if status is higher or same – as we will keep track of the latest one | ||
static bool is_latest_status(uint8_t current, uint8_t candidate) { | ||
uint8_t first_significant = current | candidate; | ||
// connected(LSb) is the highest status, so check from right | ||
for (uint8_t i = 0; i < Expansion_EdgeStatus_EdgeStatus_ARRAYSIZE; i++) { | ||
uint8_t mask = 1 << i; | ||
if (!(first_significant & mask)) | ||
continue; | ||
return candidate & mask; | ||
} | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not obvious to me why we need the complication of remembering all the statuses an edge has seen using a mask. it is true that in bi/multidirectional search an edge can be reached and settled multiple times (once for each search direction) but we dont need to know the prior statuses we only need to know the most important (lowest value) one its ever seen. tell me what im missing here. cant this just be:
// check if status is higher or same – as we will keep track of the latest one | |
static bool is_latest_status(uint8_t current, uint8_t candidate) { | |
uint8_t first_significant = current | candidate; | |
// connected(LSb) is the highest status, so check from right | |
for (uint8_t i = 0; i < Expansion_EdgeStatus_EdgeStatus_ARRAYSIZE; i++) { | |
uint8_t mask = 1 << i; | |
if (!(first_significant & mask)) | |
continue; | |
return candidate & mask; | |
} | |
return false; | |
} | |
// check if status is higher or same – as we will keep track of the latest one | |
static bool is_latest_status(uint8_t current, uint8_t candidate) { | |
return candidate < current; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, yes, that would simplify a lot of things. I implemented it this way because it was the proposed solution in the original issue (#4601). To be honest, I assumed it had some use case I wasn't aware of but would be useful anyway. @nilsnolde, could you please shed some insight on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, @kevinkreiser is of course right. some things only come to you when it's in code I guess :) sorry for the extra hassle I caused!
} | ||
stage |= edge_state.at(edgeid).stages_mask; | ||
} | ||
edge_state[edgeid] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i very much appreciate that we dont fill up this large map when not deduping, thank you for that!
if (tile == nullptr) { | ||
LOG_ERROR("thor_worker_t::expansion error, tile no longer available" + | ||
std::to_string(edgeid.Tile_Base())); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know this was here before but it seems crazy pedantic. the graph reader can fail to get a tile for sure but not one its already seen (ie we know is in the tileset). the worst case is that its a cache miss and it has to pull it from disk or from network but if the algorithm is passing you this graph id for an edge its seen it will have to be able to get the edge. imho we can delete this check, if this happens someone is deleting tiles from the data storage while the application is running which is not worth protecting against
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair
const auto* edge = tile->directededge(edgeid); | ||
auto shape = tile->edgeinfo(edge).shape(); | ||
|
||
if (!edge->forward()) | ||
std::reverse(shape.begin(), shape.end()); | ||
Polyline2<PointLL>::Generalize(shape, gen_factor, {}, false); | ||
if (dedupe) { | ||
uint8_t stage = 1 << status; | ||
if (edge_state.contains(edgeid)) { | ||
auto edge_stages = edge_state.at(edgeid).stages_mask; | ||
// Keep only properties of last/highest status of edge, but update what stages the edge | ||
// has seen | ||
if (!expansion_properties_t::is_latest_status(edge_stages, stage)) { | ||
edge_state.at(edgeid).stages_mask |= stage; | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we refactor this section just a little bit to add an early exit before the shape stuff? something like this (pseudo code, i suggest renaming the stage
member to status
if the thing i said about the mask makes sense):
// if we've seen it before we can update if better and bail either way
auto maybe_state = edge_state.find(edgeid); // works for dedupe == false, since its empty
if (maybe_state != edge_state.cend()) {
if (maybe_state->status > status)
edge_state[edgeid].status = status;
return;
}
// get the shape ready because we're going to use it
const auto* edge = tile->directededge(edgeid);
auto shape = tile->edgeinfo(edge).shape();
if (!edge->forward())
std::reverse(shape.begin(), shape.end());
Polyline2<PointLL>::Generalize(shape, gen_factor, {}, false);
// either we track it for serialization later or
if (dedupe) {
edge_state[edgeid] = expansion_properties_t(prev_edgeid, status, duration, distance, shape, cost);
} // we simply serialize it now
else {
writeExpansionProgress(expansion, edgeid, prev_edgeid, shape, exp_props, status, status, duration, distance, cost, false);
}
@@ -146,6 +199,15 @@ std::string thor_worker_t::expansion(Api& request) { | |||
// anyway | |||
} | |||
|
|||
// assemble the properties from latest/highest stages it went through | |||
if (dedupe) { | |||
for (const auto& e : edge_state) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its too bad this doesnt maintain insertion order, would be a kind of convenient side effect
static const std::vector<std::string> status_captions{{"status_connected"}, | ||
{"status_settled"}, | ||
{"status_reached"}}; | ||
for (uint8_t j = 0; j < status_captions.size(); j++) | ||
writer(status_captions[j], (bool)(expansion.edge_status_flags(i) & 1 << j)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i think this is unneccesary the best status that is reached implies any others below it were reached
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i had a few suggested changes but for the most part it looks good!
This PR addresses the issue described in #4601, implementing the solution proposed by @nilsnolde. Currently, the expansion process is overly verbose because it logs edges multiple times—for each stage of expansion and when an edge is reached from another branch. This PR keeps track of every affected edge during expansion only once, with all properties associated with it (necessary for the final response assembly). Properties such as cost and duration for already tracked edges are updated only at stages that are higher or same (to keep track of latest ones). This also means we keeping only latest prev_edge_it so we lose information about whether an edge was reached from other branches. I'm not sure how significant this issue might be. Thoughts?
This is still a draft, and gurka tests are missing, as I would appreciate some feedback before proceeding further.
Tasklist