Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp: The regexp package should support context #67278

Open
purpleidea opened this issue May 9, 2024 · 9 comments
Open

regexp: The regexp package should support context #67278

purpleidea opened this issue May 9, 2024 · 9 comments
Labels
Proposal WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@purpleidea
Copy link

Go version

latest

Output of go env in your module/workspace:

N/A

What did you do?

N/A

What did you see happen?

Regexp did not cancel early

What did you expect to see?

It would be sensible to have a long running regexp match operation be cancellable with the context package.

I know this package was created before the context package was, but this might be a useful addition for a future extension of this package or golang2.x

Cheers

@mauri870
Copy link
Member

Can you change your proposal to follow the guidelines at https://github.com/golang/proposal? Thanks.

@mauri870 mauri870 added this to the Proposal milestone May 10, 2024
@seankhliao
Copy link
Member

How often do you have a match that runs that long?
When should the regex check the context for cancellation? there's no io operations

@seankhliao seankhliao added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label May 10, 2024
@purpleidea
Copy link
Author

When should the regex check the context for cancellation? there's no io operations

This is an excellent question! Here's how you do it. This is from an example where I did something in a similar lib for type unification. You add the ctx select block shown below anywhere there's a long running operation.

func LongTypeUnificationOperation(ctx context.Context, inputData interface{}) error {

        for _, x := range inputData {
                select {
                case <-ctx.Done():
                        return ctx.Err()
                default:
                        // pass
                }

                if err := Unify(x.Actual, x.Expect); err != nil {
                        return err
                }
                
        }
        // do more stuff...

        return nil
}

HTH and sorry I didn't explain it clearer before.

@purpleidea
Copy link
Author

How often do you have a match that runs that long?

It's necessary in latency sensitive contexts, and sometimes it's a user-supplied regexp, so I can't guarantee in advance that it's not slow/exponential or worse/etc. (I forget the specific time complexities possible with/without backtracking and in this lib, but you get the idea...)

@purpleidea
Copy link
Author

Can you change your proposal to follow the guidelines at https://github.com/golang/proposal? Thanks.

image

Not sure what I did wrong. Feel free to edit as you see fit.

@apparentlymart
Copy link

I guess this proposal calls for inserting a non-blocking read from the context's "Done" channel somewhere in each iteration of this loop:

go/src/regexp/exec.go

Lines 197 to 240 in a0a6026

for {
if len(runq.dense) == 0 {
if startCond&syntax.EmptyBeginText != 0 && pos != 0 {
// Anchored match, past beginning of text.
break
}
if m.matched {
// Have match; finished exploring alternatives.
break
}
if len(m.re.prefix) > 0 && r1 != m.re.prefixRune && i.canCheckPrefix() {
// Match requires literal prefix; fast search for it.
advance := i.index(m.re, pos)
if advance < 0 {
break
}
pos += advance
r, width = i.step(pos)
r1, width1 = i.step(pos + width)
}
}
if !m.matched {
if len(m.matchcap) > 0 {
m.matchcap[0] = pos
}
m.add(runq, uint32(m.p.Start), pos, m.matchcap, &flag, nil)
}
flag = newLazyFlag(r, r1)
m.step(runq, nextq, pos, pos+width, r, &flag)
if width == 0 {
break
}
if len(m.matchcap) == 0 && m.matched {
// Found a match and not paying attention
// to where it is, so any match will do.
break
}
pos += width
r, width = r1, width1
if r != endOfText {
r1, width1 = i.step(pos + width)
}
runq, nextq = nextq, runq
}

(and perhaps some other loops too, but this one seemed the most likely to my not-so-familiar-with-this-package eyes)

However, I wouldn't want to significantly regress the performance of the non-cancelable case given that long-running regex matches seem like an unusual situation. Maybe it would be worth experimenting in a local branch to see what impact it would have to repeatedly try to read a channel in that loop.

If the overhead is material then it seems like there'd either need to be two separate variations of the same function, or there'd need to be some way to avoid that overhead in the case where there's no context provided.

@purpleidea
Copy link
Author

@apparentlymart Your whole comment was very helpfully phrased. Thanks for commenting!

If the overhead is material then it seems like there'd either need to be two separate variations of the same function, or there'd need to be some way to avoid that overhead in the case where there's no context provided.

Indeed, it probably makes sense to add a new function that takes a context, and the remaining ones get ported to internally just pass context.Background(). Fully backwards compatibility, but new consumers can use the cancellable version if they want to.

@ianlancetaylor
Copy link
Contributor

It would help to know whether this is a problem that has been observed in practice, or whether this is a theoretical problem. Go's regexp package is based on RE2, which was designed to not have exponential behavior. See https://swtch.com/~rsc/regexp/regexp3.html.

@seankhliao seankhliao added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels May 11, 2024
@purpleidea
Copy link
Author

Go's regexp package is based on RE2, which was designed to not have exponential behavior. See https://swtch.com/~rsc/regexp/regexp3.html.

Yes awesome, I remember reading this article, thanks for adding the link! That's what I was referring to above.

It would help to know whether this is a problem that has been observed in practice, or whether this is a theoretical problem.

Great question. It's somewhere in the middle. I think this might be generally useful, but I can only offer my personal use-case.

As part of https://github.com/purpleidea/mgmt/ (think a general automation tool like Ansible) users write code in a small DSL. If the user uses a regexp function, we implement that by wrapping the golang regexp library to run their user-supplied regexp when that function runs.

The catch is that we have no control over what kind of regexp the user provides. We are also very good about shutting everything down when there is a global request to cancel. Many others tools when asked to cancel have a long shutdown process, which usually ends up with the user hitting ^\ which can cause data corruption.

To avoid these cases, we want to guarantee we can shutdown very quickly. Say in under 500ms. (Arbitrary, but shorter is better.) I identified the regexp library as one situation where we wouldn't be able to guarantee that.

So TL;DR: there isn't a specific user that I know of personally experiencing this pain point today, but if possible I would like to proactively plumb this in to prevent a future negative user experience or worse.

Happy to answer any other questions.

(Our internal function API that wraps this is a: func(context.Context, []types.Value) (types.Value, error) where types is our internal types/values package (similar to golang's reflect) and the context is how we ask any running function to shutdown sooner where possible.)

@seankhliao seankhliao added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposal WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

5 participants