Refactor agent delegation and tweak micro agents #1910

li-boxuan · 2024-05-20T03:44:11Z

This PR fixes #1897. In addition, this PR fixes and tweaks a few micro-agents.

For the first time, I am able to use ManagerAgent to complete test_write_simple_script and test_edits tasks in integration tests, so this PR also adds ManagerAgent as part of integration tests. test_write_simple_script involves delegation to CoderAgent while test_edits involves delegation to TypoFixerAgent.

Also for the first time, I am able to use DelegateAgent to complete test_write_simple_script and test_edits tasks in integration tests, so this PR also adds DelegateAgent as part of integration tests. It involves delegation to StudyRepoForTaskAgent, CoderAgent and VerifierAgent.

This PR is a blocker for #1735 and likely #1945.

TODOs:

Delegate controller starting from step 0
Parent history should include child's history
Child should not know parent's history
Make sure DelegateAgent work
Add sanity checks for subscriptions

Future work:

Add a config that controls global max steps

li-boxuan · 2024-05-20T04:29:20Z

@rbren This might have some overlap/conflict with #1660, but I believe we anyways need to fix the current broken AgentDelegateAction. Your review is greatly appreciated!

.github/workflows/run-integration-tests.yml

opendevin/controller/agent_controller.py

rbren · 2024-05-20T15:38:40Z

opendevin/events/stream.py

    def subscribe(self, id: EventStreamSubscriber, callback: Callable):
        if id in self._subscribers:
-            raise ValueError('Subscriber already exists: ' + id)
+            self._subscribers[id].append(callback)
        else:
-            self._subscribers[id] = callback
+            self._subscribers[id] = [callback]


I've gone back and forth on this, but I think we should raise an exception here. It's indicative of something going very wrong. Not raising here caused me to miss a pretty severe bug in another PR

Oh I see the issue--you've got multiple AgentControllers registering themselves...

Let me think on this one for a minute.

It’s by design and unavoidable when you have (multi-layer) delegates - so I maintain a stack here.

What if the parent AgentController just calls on_event to its child? Then we only subscribe once from the top-level AgentController.

We should probably pass None as the event_stream to the Delegate in order to prevent issues

That's a viable approach I think, but it would be hard to implement/manage multi-layer agent delegate. If grandpa delegates to parent, which in turn delegates to a child, e.g. main agent -> database agent -> postgres migration agent, it would be hard/inelegant to let the root AgentController handle everything.

Can you explain more the issue here? Each AgentController would get to see every event, so it should be flexible enough to handle every possible case.

To be clear, the logic I'm envisioning is something like

def on_event(evt): if isinstance(evt, Foo): # ...all the normal logic... if self.delegate: self.delegate.on_event(evt)

Oh that’s a great idea! Will try

Done. I am not happy with the current implementation either (need to check whether itself is a delegate before it subscribe/unsubscribe) but looks like there's no perfect approach!

We cannot just pass None as event stream because AgentController needs to call event_stream.add_event.

I changed it back to the original implementation - let each AgentController worry about their own subscriptions and listeners. I added an append parameter to event_stream.subscribe method, and only delegates could set this parameter to true. This should largely prevent bugs like duplicate subscription...

opendevin/controller/agent_controller.py

rbren

This LGTM overall! Just one suggestion on history

opendevin/controller/agent_controller.py

rbren · 2024-05-23T19:19:24Z

@li-boxuan for some reason I can't open a PR into your branch, but here's my repro:

https://github.com/OpenDevin/OpenDevin/tree/rb/delegate-repro

See the edits in DummyAgent--it builds a stack of delegates, which just finish immediately

This consistently trips over the error:

poetry run python opendevin/core/main.py -c DummyAgent -t "write a todo list app in react"

li-boxuan · 2024-05-24T04:15:04Z

for some reason I can't open a PR into your branch

this is on my fork, but you should have access to push to my branch directly. You just need to first do git remote add li-boxuan https://github.com/li-boxuan/OpenDevin.git

li-boxuan · 2024-05-24T06:54:53Z

I'll come back and work on this this weekend. Hope I could make some progress Friday evening (PT time).

li-boxuan · 2024-05-26T23:19:02Z

tests/integration/mock/ManagerAgent/test_edits/response_008.log

+{
+  "action": "read",
+  "args": {
+    "path": "good.txt"


FWIW, verifier Agent has severely hallucinated that there's a "good.txt" - a name coming out of nowhere.

In the long term, we do need to think of some intermediate representation of code changes - e.g. ".patch" or ".diff" so that the verifier agent could really see the "updates". Now it only sees the snapshot after the "fix" is already applied.

Alternatively, maybe we should work on the prompts and don't assign tasks to verifier agent if it's not in a git repository.

opendevin/controller/agent_controller.py

enyst · 2024-05-27T09:22:30Z

opendevin/controller/agent_controller.py

-                await self.event_stream.add_event(obs, EventSource.AGENT)
+
+                # update current controller's state
+                self.state.history += self.delegate.state.history


Please help me understand this a little, now that the delegate is done, does the parent need to send the child's historical messages to the LLM going forward?

Or is there another purpose for updating itself with its history?

Originally suggested by @rbren here: #1910 (comment)

I think the rationale is, the parent might be interested in learning what the child has done, to drive the task forward. It is debatable, though, that the parent may only need to know the result/summary of its child. But should parent trust child's conclusion? What if the child executes with a weaker LLM (not possible at the moment but maybe in the future to lower down the cost - parent agent using more powerful LLM while delegates use small models). Would a memory condenser be interesting? These open questions are hard to answer without benchmarking...

Yeah I've gone back and forth on this.

I think keeping the parent's memory out of the child makes a lot of sense. I could also see a case for keeping the child's memory out of the parent.

I forget what the problem was that led me to that original comment, but it definitely broke something I was working on...

I say we get this in either way, and we can take another look at history management soon

Yeah let's keep it this way at the moment.

I changed my mind when I was working on #2103. The parent agent would get confused when it sees the child agent's history because the parent doesn't know those commands used by child agent.

So now, the parent agent only learns about the result provided by the child agent.

It might be pure luck, but after removing child history, I ran regenerate and ManagerAgent was able to finish the test tasks in fewer turns.

Interesting! Thanks for the details. Hmm, food for thought... What LLM did you use in the tests?

I was using gpt-4o (and gpt-4-turbo, sometimes).

li-boxuan · 2024-05-27T18:48:33Z

agenthub/micro/study_repo_for_task/prompt.md

+
+## Examples
+
+Here is an example of how you can interact with the environment for task solving:


FWIW, I had to do this one-shot learning otherwise StudyRepoForTaskAgent, backed by GPT-4o or GPT-4-turbo, consistently writes code to finish the task by itself, no matter how much I emphasize it should not make changes/write data/implement the task by itself.

What's more interesting is, if I copy the full prompt (without this example) to ChatGPT-4, it can give reasonable results - calling finish action once it sees the repository is empty.

li-boxuan · 2024-05-29T01:44:55Z

@rbren I believe there are still a few debatable designs such as history management, but I am going to merge this PR as it is, since agent delegation is broken at the moment, and this PR doesn't introduce regression. We could discuss & improve later.

li-boxuan marked this pull request as ready for review May 20, 2024 04:28

li-boxuan requested a review from rbren May 20, 2024 04:28

li-boxuan added agent framework strategies for prompting, agent, etc micro agent add or update a micro-agent labels May 20, 2024

li-boxuan assigned rbren May 20, 2024

rbren reviewed May 20, 2024

View reviewed changes

.github/workflows/run-integration-tests.yml Outdated Show resolved Hide resolved

rbren reviewed May 20, 2024

View reviewed changes