-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor agent delegation and tweak micro agents #1910
Conversation
opendevin/events/stream.py
Outdated
def subscribe(self, id: EventStreamSubscriber, callback: Callable): | ||
if id in self._subscribers: | ||
raise ValueError('Subscriber already exists: ' + id) | ||
self._subscribers[id].append(callback) | ||
else: | ||
self._subscribers[id] = callback | ||
self._subscribers[id] = [callback] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone back and forth on this, but I think we should raise an exception here. It's indicative of something going very wrong. Not raising here caused me to miss a pretty severe bug in another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see the issue--you've got multiple AgentControllers registering themselves...
Let me think on this one for a minute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s by design and unavoidable when you have (multi-layer) delegates - so I maintain a stack here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the parent AgentController just calls on_event
to its child? Then we only subscribe once from the top-level AgentController.
We should probably pass None as the event_stream to the Delegate in order to prevent issues
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a viable approach I think, but it would be hard to implement/manage multi-layer agent delegate. If grandpa delegates to parent, which in turn delegates to a child, e.g. main agent -> database agent -> postgres migration agent, it would be hard/inelegant to let the root AgentController handle everything.
Can you explain more the issue here? Each AgentController would get to see every event, so it should be flexible enough to handle every possible case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, the logic I'm envisioning is something like
def on_event(evt):
if isinstance(evt, Foo):
# ...all the normal logic...
if self.delegate:
self.delegate.on_event(evt)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that’s a great idea! Will try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I am not happy with the current implementation either (need to check whether itself is a delegate before it subscribe/unsubscribe) but looks like there's no perfect approach!
We cannot just pass None as event stream because AgentController needs to call event_stream.add_event
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it back to the original implementation - let each AgentController worry about their own subscriptions and listeners. I added an append
parameter to event_stream.subscribe
method, and only delegates could set this parameter to true. This should largely prevent bugs like duplicate subscription...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM overall! Just one suggestion on history
@li-boxuan for some reason I can't open a PR into your branch, but here's my repro: https://github.com/OpenDevin/OpenDevin/tree/rb/delegate-repro See the edits in DummyAgent--it builds a stack of delegates, which just finish immediately This consistently trips over the error:
|
this is on my fork, but you should have access to push to my branch directly. You just need to first do |
I'll come back and work on this this weekend. Hope I could make some progress Friday evening (PT time). |
{ | ||
"action": "read", | ||
"args": { | ||
"path": "good.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, verifier Agent has severely hallucinated that there's a "good.txt" - a name coming out of nowhere.
In the long term, we do need to think of some intermediate representation of code changes - e.g. ".patch" or ".diff" so that the verifier agent could really see the "updates". Now it only sees the snapshot after the "fix" is already applied.
Alternatively, maybe we should work on the prompts and don't assign tasks to verifier agent if it's not in a git repository.
await self.event_stream.add_event(obs, EventSource.AGENT) | ||
|
||
# update current controller's state | ||
self.state.history += self.delegate.state.history |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please help me understand this a little, now that the delegate is done, does the parent need to send the child's historical messages to the LLM going forward?
Or is there another purpose for updating itself with its history?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally suggested by @rbren here: #1910 (comment)
I think the rationale is, the parent might be interested in learning what the child has done, to drive the task forward. It is debatable, though, that the parent may only need to know the result/summary of its child. But should parent trust child's conclusion? What if the child executes with a weaker LLM (not possible at the moment but maybe in the future to lower down the cost - parent agent using more powerful LLM while delegates use small models). Would a memory condenser be interesting? These open questions are hard to answer without benchmarking...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I've gone back and forth on this.
I think keeping the parent's memory out of the child makes a lot of sense. I could also see a case for keeping the child's memory out of the parent.
I forget what the problem was that led me to that original comment, but it definitely broke something I was working on...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I say we get this in either way, and we can take another look at history management soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let's keep it this way at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed my mind when I was working on #2103. The parent agent would get confused when it sees the child agent's history because the parent doesn't know those commands used by child agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now, the parent agent only learns about the result provided by the child agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be pure luck, but after removing child history, I ran regenerate and ManagerAgent was able to finish the test tasks in fewer turns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! Thanks for the details. Hmm, food for thought... What LLM did you use in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using gpt-4o (and gpt-4-turbo, sometimes).
|
||
## Examples | ||
|
||
Here is an example of how you can interact with the environment for task solving: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I had to do this one-shot learning otherwise StudyRepoForTaskAgent, backed by GPT-4o or GPT-4-turbo, consistently writes code to finish the task by itself, no matter how much I emphasize it should not make changes/write data/implement the task by itself.
What's more interesting is, if I copy the full prompt (without this example) to ChatGPT-4, it can give reasonable results - calling finish
action once it sees the repository is empty.
dcd87b4
to
2ae6051
Compare
@rbren I believe there are still a few debatable designs such as history management, but I am going to merge this PR as it is, since agent delegation is broken at the moment, and this PR doesn't introduce regression. We could discuss & improve later. |
This PR fixes #1897. In addition, this PR fixes and tweaks a few micro-agents.
For the first time, I am able to use
ManagerAgent
to completetest_write_simple_script
andtest_edits
tasks in integration tests, so this PR also addsManagerAgent
as part of integration tests.test_write_simple_script
involves delegation toCoderAgent
whiletest_edits
involves delegation toTypoFixerAgent
.Also for the first time, I am able to use
DelegateAgent
to completetest_write_simple_script
andtest_edits
tasks in integration tests, so this PR also addsDelegateAgent
as part of integration tests. It involves delegation toStudyRepoForTaskAgent
,CoderAgent
andVerifierAgent
.This PR is a blocker for #1735 and likely #1945.
TODOs:
Future work: