Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intersects or not slow #288

Open
MBkkt opened this issue Dec 16, 2022 · 12 comments
Open

intersects or not slow #288

MBkkt opened this issue Dec 16, 2022 · 12 comments

Comments

@MBkkt
Copy link
Contributor

MBkkt commented Dec 16, 2022

I think main issue this todo
https://github.com/google/s2geometry/blob/master/src/s2/s2boolean_operation.cc#L2255

Will it be done sometime?

@jmr
Copy link
Member

jmr commented Dec 17, 2022

Do you have test data that triggers it?

I'll try do to a new release soon and add the microbenchmarks. This will make it easier to reproduce/profile in isolation.

@MBkkt
Copy link
Contributor Author

MBkkt commented Dec 17, 2022

Not, it's customer data but I can describe it.
#290

X polygons is rectangles, and Y is any S2Loop

But I think it's concrete case in general I want to find some more effective way.

I found CrossingProcessor created every time, if exist way to create it single time and when clear it became better.
And same for vectors with Edges starts_a, and starts_b.

I think it can be class which inherit from S2BooleanOperation.
Something like S2CachedBooleanOperation which class can contains all this temporary fields and have Clear method.

In such case we at least can avoid allocate/free many times

@MBkkt
Copy link
Contributor Author

MBkkt commented Dec 28, 2022

@jmr

What do you think about add something like

Intersects(S2Polygon& lhs, S2Polygon& rhs) {
  if (lhs.num_loops() == 1 && rhs.num_loops() == 1) {
    return lhs.loop(0).Intersects(rhs.loop(0));
  }
  current expensive code

Same possible for Contains(Polygon&)

I think it's pretty common case when polygon contains only single loop? (Any single polygon without holes)
Even separate ctor exists for this case.

@jmr
Copy link
Member

jmr commented Jan 4, 2023

It seems like it would be better if S2BooleanOperation could do it quickly. Then more code would benefit. I don't know about the feasibility of this. @smcallis

Not, it's customer data but I can describe it.

Can you make a similar, synthetic test case not derived from the customer data?

@smcallis
Copy link
Collaborator

smcallis commented Jan 4, 2023

+1 to S2BooleanOperation doing this faster, I believe we have it on the roadmap but I'm not sure when we'll get to it.

@MBkkt
Copy link
Contributor Author

MBkkt commented Jan 4, 2023

Sounds good if it will be faster.

I think I can provide some benchmarks later.

In general I tried to speedup arangodb searching geo data with inverted index (of course firstly I fixed not related to S2 issue).

And know I have two issues:

  1. Slow S2BooleanOperation
  2. I don't really sure what format on disk I should use, now It's just GeoJson, which we parse to S2 structure, it's not precise (store lat lng in degrees, not S2Point) and slow (needs allocation, a lot of conditions).

So I also have question about 2, will be nice if you have some advice.

The main pros of storing geo json it's more compact, especially for points (needs only 8*2 + 1 bytes).
I have question: Is lat lng never used to storing S2Region/Shape because it has a precision lost? Or it's because you consider trigonometry translation degrees latlng to S2Point is more expensive?

Also now I implemented serialization with s2 Decoder/Encoder and S2Region (we use it for precise comparison, after finding terms from S2TermIndexer)

And I not sure, I see a lot of lazy classes which named like S2LaxPolygonShape, is it better to use?

But how to intersect/etc they?

@smcallis
Copy link
Collaborator

smcallis commented Jan 6, 2023

I'm working on new bulk queries, probably the intersection query will be what you want, it'll be substantially faster than S2BooleanOperation which effectively computes the intersection then tells you if it's empty or not. I'll be writing that here in Q1, I've got all the pieces written and just need to integrate and test.

As for formats, I'd recommend storing data directly in the encoded format. Build an S2Polygon (or S2LaxPolygonShape) and store that binary blob directly. We can decode those very quickly and lazily if need be.

I don't see how GeoJson can be smaller since it's text based, unless you're throwing away almost all your precision and storing just two decimal points. If you want to do that, better to "snap" your geometry to a given S2Cell level. Two digits of precision is ~a level 9 cell (source), so you could snap to level 9 and use COMPACT encoding and the encoder can use the fact that the points are snapped to reduce the size (they can be stored as integers, level 9 would have 3+18+1=22 bits/point).

S2LaxPolygon[Polyline]Shape are lax not lazy. They allow degeneracies such as degenerate edges (v0,v0), and sibling pairs (two reversed edges touching). They're meant to be more flexible and ultimately replace S2Polygon/S2Polyline, and they still work with things like S2BooleanOperation.

@MBkkt
Copy link
Contributor Author

MBkkt commented Jan 6, 2023

@smcallis

Thanks for detailed answer.

I don't see how GeoJson can be smaller since it's text based

We store GeoJson in binary representation, something like messagepack/bson/etc if you familiar (so it without some text overhead, like spaces, comma, etc)

In average we need 16 byte per point (because GeoJson describe points like lat lon), and few additional bytes per array to store then it starts/ends + constant like 40 bytes.

S2Polygon needs:
4*24 bytes bound
3 bytes own loops/has hole/format version markers
4 byte loops count
Every loop:
1 byte version marker
4 bytes number of vertices
24 bytes per vertex
1 byte origin state
4 byte depth ?
4 * 24 bytes for bound

So it's extremely bad for small polygons

S2LaxPolygon Encode looks a lot of better
It's just all vertices + marks of loops start

So it better then lat, lon encoding GeoJson before 15~20 vertices, then it also worse but it's understandable

Of course with CodingHint Compact it can be better for non convex polygons, but only a little

In general I think we should use S2LaxPolygon instead of S2Polygon, it looks a lot of better, but translation is not easy

@MBkkt
Copy link
Contributor Author

MBkkt commented Jan 6, 2023

S2LaxPolygon[Polyline]Shape are lax not lazy

I meant they have EncodedS2Lax* interface to decode them lazily

@MBkkt
Copy link
Contributor Author

MBkkt commented Jan 6, 2023

and they still work with things like S2BooleanOperation.

Yes, but they need to compute index.

For an example S2Polygon::Contains(Point) not always need to do it.

I think my main issue:
I have operations like contains/intersects/etc

I want to compute exact yes or no.

And one operand from user query, for it I can do heavy precompute, and it's size doesn't really matter.

Another operand from index, in general from disk, and I already know it probably intersects (because I found it terms (terms from coverage))
and for index data I want it to be compact, and fast to extract and do operation with user data.

Simplified:

auto data1 = something heavy and precompute
for (data2 in index where data2 terms intersects with data1 terms) {
If (fast_intersects/contains/etc(data1, data2)) { push(data2); }

}

@smcallis
Copy link
Collaborator

smcallis commented Jan 9, 2023

In general I think we should use S2LaxPolygon instead of S2Polygon, it looks a lot of better, but translation is not easy

Yeah the Lax classes are implementations of the new Shape API which is more flexible in general, S2Polygon is really legacy at this point, I'd encourage you to use Shapes whenever possible.

I meant they have EncodedS2Lax* interface to decode them lazily

Ah yes they do, we have EncodedS2ShapeIndex as well so you can decode the index and the shapes in it lazily as well to minimize disk seeks. You can decode them all at once as well though.

For an example S2Polygon::Contains(Point) not always need to do it.

We have s2shapeutil::ContainsBruteForce that can check point containment without an index. Intersection and containment are the same for a polygon/point but merely checking for vertex containment and lack of edge crossings isn't sufficient for polygons because they have interior that can escape, so you have to verify orientation as well, which makes it a fair bit harder.

@MBkkt
Copy link
Contributor Author

MBkkt commented Jan 9, 2023

s2shapeutil::ContainsBruteForce

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants