Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dogfood Baselines internally #5553

Open
mildm8nnered opened this issue May 1, 2024 · 5 comments
Open

Dogfood Baselines internally #5553

mildm8nnered opened this issue May 1, 2024 · 5 comments

Comments

@mildm8nnered
Copy link
Collaborator

mildm8nnered commented May 1, 2024

% swiftlint --enable-all-rules --progress --reporter summary on SwiftLint produces the following violations:

+-----------------------------------------+--------+-------------+--------+----------+--------+------------------+-----------------+
| rule identifier                         | opt-in | correctable | custom | warnings | errors | total violations | number of files |
+-----------------------------------------+--------+-------------+--------+----------+--------+------------------+-----------------+
| explicit_type_interface                 | yes    | no          | no     |    3,878 |      0 |            3,878 |             541 |
| explicit_acl                            | yes    | no          | no     |    3,065 |      0 |            3,065 |             535 |
| indentation_width                       | yes    | no          | no     |    1,833 |      0 |            1,833 |             333 |
| prefer_nimble                           | yes    | no          | no     |      896 |      0 |              896 |              65 |
| explicit_top_level_acl                  | yes    | no          | no     |      722 |      0 |              722 |             463 |
| multiline_arguments_brackets            | yes    | no          | no     |      644 |      0 |              644 |             113 |
| required_deinit                         | yes    | no          | no     |      625 |      0 |              625 |             318 |
| no_extension_access_modifier            | yes    | no          | no     |        0 |    550 |              550 |             274 |
| no_magic_numbers                        | yes    | no          | no     |      525 |      0 |              525 |              94 |
| type_contents_order                     | yes    | no          | no     |      452 |      0 |              452 |             282 |
| one_declaration_per_file                | yes    | no          | no     |      356 |      0 |              356 |              75 |
| implicit_return                         | yes    | yes         | no     |      350 |      0 |              350 |             140 |
| no_grouping_extension                   | yes    | no          | no     |      223 |      0 |              223 |             207 |
| multiline_arguments                     | yes    | no          | no     |      213 |      0 |              213 |              54 |
| multiline_parameters_brackets           | yes    | no          | no     |      209 |      0 |              209 |              56 |
| file_types_order                        | yes    | no          | no     |      190 |      0 |              190 |              73 |
| anonymous_argument_in_multiline_closure | yes    | no          | no     |      185 |      0 |              185 |              64 |
| missing_docs                            | yes    | no          | no     |      164 |      0 |              164 |              23 |
| sorted_enum_cases                       | yes    | no          | no     |      132 |      0 |              132 |              29 |
| multiline_literal_brackets              | yes    | no          | no     |      109 |      0 |              109 |              27 |
| vertical_whitespace_between_cases       | yes    | yes         | no     |      109 |      0 |              109 |              45 |
| force_unwrapping                        | yes    | no          | no     |      104 |      0 |              104 |              36 |
| switch_case_on_newline                  | yes    | no          | no     |       97 |      0 |               97 |              18 |
| explicit_enum_raw_value                 | yes    | no          | no     |       84 |      0 |               84 |              15 |
| conditional_returns_on_newline          | yes    | no          | no     |       76 |      0 |               76 |              50 |
| trailing_closure                        | yes    | yes         | no     |       72 |      0 |               72 |              43 |
| convenience_type                        | yes    | no          | no     |       59 |      0 |               59 |              54 |
| prefixed_toplevel_constant              | yes    | no          | no     |       53 |      0 |               53 |              33 |
| strict_fileprivate                      | yes    | no          | no     |       51 |      0 |               51 |              21 |
| prefer_self_in_static_references        | yes    | yes         | no     |       49 |      0 |               49 |              24 |
| multiline_function_chains               | yes    | no          | no     |       41 |      0 |               41 |              29 |
| multiline_parameters                    | yes    | no          | no     |       39 |      0 |               39 |              26 |
| discouraged_optional_collection         | yes    | no          | no     |       33 |      0 |               33 |              20 |
| function_default_parameter_at_end       | yes    | no          | no     |       18 |      0 |               18 |              12 |
| closure_body_length                     | yes    | no          | no     |       15 |      2 |               17 |               6 |
| todo                                    | no     | no          | no     |       11 |      0 |               11 |               8 |
+-----------------------------------------+--------+-------------+--------+----------+--------+------------------+-----------------+
| Total                                   |        |             |        |   15,682 |    552 |           16,234 |             602 |
+-----------------------------------------+--------+-------------+--------+----------+--------+------------------+-----------------+

Some of these rules we are unlikely to ever want on. For example, explicit_type_interface (3,878 violations), but other, such as implicit_return (350 violations) could actually be usefully on.

For the latter cases, we could simply patch all of the existing violations and enable them, but we could also use a baseline to ignore (all of some of) the existing violations for now, but prevent any new violations of those rules being introduced.

Open questions would be:

  1. Should we just patch all the violations of rules we want anyway.

In the case of some rules (e.g. force_unwrapping) some case may be awkward to avoid in code.

  1. Assuming we don't just patch everything, which rules should we enable?

See my comment below for some suggestions on which rules we may want to enable.

  1. How to run and configure SwiftLint?

SwiftLint is a bit weird, in that unlike most deployments of SwiftLint, there is no SwiftLint run during building - this can be a bit annoying, as you may not find out about violations until you run the integration tests, but is understandable - we would need a release build of SwiftLint to lint our local debug builds efficiently, and what version should that be - it can't be HEAD, because that's what we're building.

Instead, during Integration tests, SwiftLint lints itself (slowly) during the debug build. The proposal would be to use a baseline on this run.

As far as configuration goes, if we configure the baselines in .swiftlint.yml (see #5552), then anyone running swiftlint will pick up the baseline configuration automatically. If we relied on command line configuration, then they would see all the violations that we are using the baseline to filter out.

  1. How to store the baseline

The baseline.json for the above violations is circa 7MB. If we just enable the rules suggested as DO or MIGHT enable, the baseline.json is about 2MB. Disabling the most frequent of those rules - indentation_width reduces the baseline size to about 1.3MB

We could store this uncompressed in the git tree, but over time, if we update it, it will increase the overall size of the repo history.

We could store it compressed, and then uncompress it during the Integration tests (and leave the uncompressed, .gitignore'd version on disk), but then anyone just running swiftlint would get an error, as the uncompressed baseline would be missing.

@mildm8nnered
Copy link
Collaborator Author

Obviously these lists can be quite subjective, but I think my personal breakdown would be something like this, without having looked at the actual violations (or even what some of the rules are). I don't have any attachment to particular rules, so if there are ones people disagree with ...

Things we probably do NOT want enabled:

| explicit_type_interface                 | yes    | no          | no     |    3,878 |      0 |            3,878 |             541 |
| explicit_acl                            | yes    | no          | no     |    3,065 |      0 |            3,065 |             535 |
| prefer_nimble                           | yes    | no          | no     |      896 |      0 |              896 |              65 |
| explicit_top_level_acl                  | yes    | no          | no     |      722 |      0 |              722 |             463 |
| required_deinit                         | yes    | no          | no     |      625 |      0 |              625 |             318 |
| no_extension_access_modifier            | yes    | no          | no     |        0 |    550 |              550 |             274 |
| type_contents_order                     | yes    | no          | no     |      452 |      0 |              452 |             282 |
| one_declaration_per_file                | yes    | no          | no     |      356 |      0 |              356 |              75 |
| no_grouping_extension                   | yes    | no          | no     |      223 |      0 |              223 |             207 |
| sorted_enum_cases                       | yes    | no          | no     |      132 |      0 |              132 |              29 |
| vertical_whitespace_between_cases       | yes    | yes         | no     |      109 |      0 |              109 |              45 |
| force_unwrapping                        | yes    | no          | no     |      104 |      0 |              104 |              36 |
| switch_case_on_newline                  | yes    | no          | no     |       97 |      0 |               97 |              18 |
| explicit_enum_raw_value                 | yes    | no          | no     |       84 |      0 |               84 |              15 |
| multiline_function_chains               | yes    | no          | no     |       41 |      0 |               41 |              29 |

Things we probably DO want enabled:

| implicit_return                         | yes    | yes         | no     |      350 |      0 |              350 |             140 |
| prefer_self_in_static_references        | yes    | yes         | no     |       49 |      0 |               49 |              24 |
| multiline_parameters                    | yes    | no          | no     |       39 |      0 |               39 |              26 |
| discouraged_optional_collection         | yes    | no          | no     |       33 |      0 |               33 |              20 |
| function_default_parameter_at_end       | yes    | no          | no     |       18 |      0 |               18 |              12 |
| closure_body_length                     | yes    | no          | no     |       15 |      2 |               17 |               6 |
| todo                                    | no     | no          | no     |       11 |      0 |               11 |               8 |

Things we MIGHT want enabled:

| indentation_width                       | yes    | no          | no     |    1,833 |      0 |            1,833 |             333 |
| multiline_arguments_brackets            | yes    | no          | no     |      644 |      0 |              644 |             113 |
| no_magic_numbers                        | yes    | no          | no     |      525 |      0 |              525 |              94 |
| multiline_arguments                     | yes    | no          | no     |      213 |      0 |              213 |              54 |
| multiline_parameters_brackets           | yes    | no          | no     |      209 |      0 |              209 |              56 |
| file_types_order                        | yes    | no          | no     |      190 |      0 |              190 |              73 |
| anonymous_argument_in_multiline_closure | yes    | no          | no     |      185 |      0 |              185 |              64 |
| missing_docs                            | yes    | no          | no     |      164 |      0 |              164 |              23 |
| multiline_literal_brackets              | yes    | no          | no     |      109 |      0 |              109 |              27 |
| conditional_returns_on_newline          | yes    | no          | no     |       76 |      0 |               76 |              50 |
| trailing_closure                        | yes    | yes         | no     |       72 |      0 |               72 |              43 |
| convenience_type                        | yes    | no          | no     |       59 |      0 |               59 |              54 |
| prefixed_toplevel_constant              | yes    | no          | no     |       53 |      0 |               53 |              33 |
| strict_fileprivate                      | yes    | no          | no     |       51 |      0 |               51 |              21 |

As @SimplyDanny pointed out in another context the other day, some violations can be reduced by configuration. For example, no_magic_numbers has an option to ignore violations in test cases, so configuring that would reduce it to about ~100 violations in non-test code.

@SimplyDanny
Copy link
Collaborator

General question about the size occupied by the baseline file: Can we reduce the data that's stored? I'm thinking about reason, ruleName, ruleDescription and maybe even severity which don't seem to be necessary to save. ruleIdentifier, file, line, character and text should be enough to uniquely identify a violation, isn't it?

@SimplyDanny
Copy link
Collaborator

With respect to rules to potentially enable in SwiftLint: If there's a rule we agree upon to enable, we might just do that either by manually fixing the violations it causes or by running its auto-fixes (if available). The rules you suggest all don't come with hundreds of violations.

In fact, when is the baseline functionality useful? I think that's when one or more of the following aspects apply:

  • Huge code base
  • Many people contributing
  • Too many violations to fix in one shot
  • No automatic fixes
  • Violations are hard to fix in finished implementation
  • Most violations in legacy code
  • ...

None of that really applies to SwiftLint. While dogfooding is always great, I don't see an immediate benefit in bringing a baseline into action.

@mildm8nnered
Copy link
Collaborator Author

General question about the size occupied by the baseline file: Can we reduce the data that's stored? I'm thinking about reason, ruleName, ruleDescription and maybe even severity which don't seem to be necessary to save. ruleIdentifier, file, line, character and text should be enough to uniquely identify a violation, isn't it?

So we could definitely reduce the data - we use severity and reason (if they change we currently do count that as a change), but could definitely potentially trim down the others.

The current approach does have two nice properties though - firstly the baseline file is a text file, so can be easily compressed, and secondly, because we're dealing in StyleViolation's throughout, we can just serialize and deserialize and compare them directly, rather than having to do any translation, which makes the code a lot simpler.

One could argue that the real redundancy is in StyleViolation itself, and that it should retrieve ruleName and ruleDescription from the ruleIdentifier rather than storing them itself.

I think the "nightmare" scenario here would be if the baseline was a binary file that did not compress well, and could not easily be diff'd - then each copy would take up more or less X bytes in the history where X was the size on disk, and that's probably what I was thinking of with my initial concerns.

I've been playing around with repo sizes, and with git at least, we get 75% compression on a single copy of one very large (35MB) baseline. Additional copies of the baseline with small changes get much better compression ratios - five small commits to the baseline end up taking up 16MB in the git history instead of 175MB.

We could reduce the size of the raw Baseline on disk, but that redundancy gets compressed away in git's storage (apart from the working copy), and we would just get worse compression ratios if we were to remove some of redundancy ourselves, so the impact on overall repo size would not be that much. Even compressing the working copy could be worse in the long run, if the baseline is updated frequently.

@SimplyDanny
Copy link
Collaborator

Okay, fair points. Let's wait for feedback from other folks then. I could imagine this will lead to some discussion. 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants