-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ahash for our own HashMap and HashSet objects #10502
base: master
Are you sure you want to change the base?
Conversation
We already transitively take a dependency on the ahash crate, so might as well use it. While not all of our HashMap/HashSet objects are performance-sensitive, some of them are and there's no cost to migrating them all over. By default, rust uses a DoS-resistant, cryptographic hash (SipHash) to hash the keys for HashMap/HashSet entries into buckets, which is slower than what we used to use in the C++ world. We don't use any of it in a security-sensitive context or based off of unvalidated/untrusted/remote input, so there are no security reasons to continue to use SipHash. Some of fish's HashMap/HashSet instances could benefit from being faster, e.g. the functions cache, various completion caches, environment variable tables, and history lookups, so this isn't just a "use it because it's there" kind of thing.
I don't mind the implications of the change, but it is noisy - it touches every HashMap and HashSet. If this could be set globally I'd be in favor, but I don't like cluttering up the code without a clear, measurable benefit. We should just use the default hash algorithm generally, and switch to a faster algorithm in any cases where profiling shows hashing to be a bottleneck. |
Just some thoughts by a passer-by. I have found either always using std or always using ahash is nice because you don't have spend effort on making a choice each time. Although, a potential issue of choosing a non-std map globally is libraries that non-generically consume/produce std ones however I doubt that will be an issue for fish. Would be cool to see some benchmarks for the global change though. A potentially less noisy approach might be to use the provided aliases in the ahash crate. The std hashmap already has to be imported so the diff is mainly the import path. (Will still need the - use std::collections::{HashMap, HashSet};
+ use ahash::{HashMap, HashSet}; Although, you then have to remember to auto import the right HashMap. Note: Using With the current structure of the code base you could also define an alias somewhere like |
I mainly feel the same way about this as @ridiculousfish does (despite being the one that opened the issue). Since the runtime/build cost of the change is zero, I initially went for a blanket find-and-replace approach (not literally, since that actually doesn't compile) but need to be convinced by benchmarks that the churn is worth it because of the degraded contribution experience. It's "just" I don't like the I think the crate-level use-indirection I mentioned above would be a more elegant solution than One thing about this PR that I did not enjoy: while in most cases changing |
So do we have any benchmarks here? I don't think most of these are very hot, except for maybe the function set and the wildcard sets. (of course the difference with e.g. the highlight maps is that you don't need throughput - it's plenty fast already because it's a purely interactive thing, you want reduced resource use) |
Okay, our benchmark suite (plus some additions I've been working on) doesn't think this changes a lot. Even if I try to specifically exercise this, like I run That means, because this complicates the code slightly, I would be disinclined to merge it. Of course if you have any benchmarks to the contrary (in particular I haven't found a good way to benchmark highlighting or the reader history search), that's something to consider. |
We already transitively take a dependency on the ahash crate, so might as well use it. While not all of our HashMap/HashSet objects are performance-sensitive, some of them are and there's no cost to migrating them all over.
By default, rust uses a DoS-resistant, cryptographic hash (SipHash) to hash the keys for HashMap/HashSet entries into buckets, which is slower than what we used to use in the C++ world. We don't use any of it in a security-sensitive context or based off of unvalidated/untrusted/remote input, so there are no security reasons to continue to use SipHash.
Some of fish's HashMap/HashSet instances could benefit from being faster, e.g. the functions cache, various completion caches, environment variable tables, and history lookups, so this isn't just a "use it because it's there" kind of thing.