Quantulum incorrect for prefixes that are not capitilized if the unit is incorrectly capitilized #161

hwalinga · 2020-09-09T15:14:56Z

Describe the bug
When a unit is incorrectly capitilized, the SI-prefix will be read wrong.

To Reproduce

from quantulum3 import parser

parser.parse('1mw')  # returns: Megawatt
parser.parse('1pw')  # returns: Petawatt

parser.parse('1Pw')  # returns: Picowatt

Expected behavior

from quantulum3 import parser

parser.parse('1mw')  # should be: Milliwatt
parser.parse('1pw')  # should be: Picowatt

parser.parse('1Pw)'  # should be: Petawatt

Note
pW and PW, mW and MW are correctly read.

Additional information:

Python Version: [3.7, 3.8]
Classifier activated/ sklearn installed: yes (pw will not be read if not installed)
OS: MX Linux 19 (Debian 10 derivative)
Version: 0.7.5

The text was updated successfully, but these errors were encountered:

nielstron · 2020-09-16T12:23:41Z

I see the issue, the main problem being that for no match given the actual capitalization of the input, the closest match for all-lower input is given (completely ignoring capitalization).

The most consistent way around this would be to return the first match obtained with as little capitalization changes as necessary (potentially preferring changing from lower to upper over upper to lower).
However this would mean that 2^(len(units)) changes are evaluated in the worst case. However this may not be a problem as unit strings do not contain more than 4 seperate units

hwalinga · 2020-09-16T13:22:40Z

Why not leave capitalization intact for matching a no-match object? Especially for prefixes this is a huge difference.

nielstron · 2020-09-16T15:05:21Z

What do you mean by "leave capitalization intact"? If units were compared without changing capitalization, "mw" would neither match "MW" nor "mW" which are the only valid official versions of writing Mega- or milliwatt respectively.

Switching to all-lower, all of "mw" is matched to "MW" as well as "mW" which both are valid units. I see that another option would be to give preference to the match that is closest to the original string (which would return mW here, as expected)

hwalinga · 2020-09-16T16:59:04Z

Then I don't know exactly how quantulum works. I thought there was machine learning doing the matches, not exact matching?

I also learned you will get random output. There is a 50 / 50 one you get either one.

nielstron · 2020-09-16T18:57:19Z

It depends. Most of the tool is a very complex regex-based mechanism, it will turn up with "1pw" or "1mw". The unit in the back is then compared to all known units. If multiple units match, either a simple heuristic or the machine learning part is used to decide which unit has the highest likelihood of being correct.

Hmm random output sounds bad. On the exact same input? Once trained, the output of the trained network should be deterministic.

…Fixes nielstron#161.

hwalinga · 2020-09-16T19:12:57Z

Yes, random output is bad. At least a warning should be thrown if picking a random option. Anyway, I made a PR.

…Fixes nielstron#161.

hwalinga added a commit to hwalinga/quantulum3 that referenced this issue Sep 16, 2020

Add code that guesses a different capitalization in case of ambiguity. …

1daa536

…Fixes nielstron#161.

hwalinga linked a pull request Sep 16, 2020 that will close this issue

Consider capitalization #162

Open

hwalinga added a commit to hwalinga/quantulum3 that referenced this issue Sep 16, 2020

Add code that guesses a different capitalization in case of ambiguity. …

f61b04d

…Fixes nielstron#161.

nielstron mentioned this issue Sep 16, 2020

Nondeterministic disambiguation #163

Open

nielstron mentioned this issue Sep 24, 2020

Add SI-derived units of Bar #165

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantulum incorrect for prefixes that are not capitilized if the unit is incorrectly capitilized #161

Quantulum incorrect for prefixes that are not capitilized if the unit is incorrectly capitilized #161

hwalinga commented Sep 9, 2020 •

edited

nielstron commented Sep 16, 2020

hwalinga commented Sep 16, 2020

nielstron commented Sep 16, 2020

hwalinga commented Sep 16, 2020 •

edited

nielstron commented Sep 16, 2020

hwalinga commented Sep 16, 2020

Quantulum incorrect for prefixes that are not capitilized if the unit is incorrectly capitilized #161

Quantulum incorrect for prefixes that are not capitilized if the unit is incorrectly capitilized #161

Comments

hwalinga commented Sep 9, 2020 • edited

nielstron commented Sep 16, 2020

hwalinga commented Sep 16, 2020

nielstron commented Sep 16, 2020

hwalinga commented Sep 16, 2020 • edited

nielstron commented Sep 16, 2020

hwalinga commented Sep 16, 2020

hwalinga commented Sep 9, 2020 •

edited

hwalinga commented Sep 16, 2020 •

edited