DynaSaur: Large Language Agents Beyond Predefined Actions

surprisetalk | 128 points | 7mon ago | arxiv.org

throwup238|7mon ago

It looks like the key insight here is to have the LLM generate its own tools (as in GPT/Claude tool calling) via Python code generation and apply cosine similarity RAG to select which tools are available at each step using the tool description and the problem/step while using recent history to error correct.

The agent starts with some human created tooling like a tool to read the file system or create another tool using python code, then starts accumulating custom Python functions it wrote itself with tool calling metadata like descriptions and input/output types. Every time it reaches a step, if it doesn't find a relevant tool, it creates a new one. Apparently this improves performance on complex tasks (via GAIA benchmark) with diminishing returns on simpler tasks.

IanCal|7mon ago

I played around making these things before, it's a fun exercise. Interesting to see that's where things may be heading.

My example was asking for a poem about the headlines (good example of info they don't have, and something that's very hard to do mechanically).

https://news.ycombinator.com/item?id=37015591

llm_trw|7mon ago

I ended up training a bert on nothing but python for the embedding search. The results were crap. Then I used an llm to write a new docstring for each class/function definition in the training data and the results were better than state of the art.

There's so much wide open space to explore. It's a shame that everyone is wasting their time with the biggest possible models they can afford.

digdugdirk|7mon ago

Do you have any more detailed info on this process? I've played around with using LLMs, but nothing in the training realm. I'd love to see a writeup or guide to the process you used there.

llm_trw|7mon ago

No and it won't do you much good even if I did.

The tools have broken again since then - thanks tensorflow data loaders - and my code only works against a version of python that's no longer supported in LTS Ubuntu/Debian10+.

I have been mulling about running a subscription service where you get up to date code that works on topics like the above. If you're interested drop me a line at my profile email and I'll add you to a mailing list when/if I ever get around to doing it.

thom|7mon ago

Seems like you could go further than this with something like DSPy and start evaluating which tools contribute to successful outcomes. Funny how much things start to look like Eurisko the more time goes on.

mountainriver|7mon ago

This is what Voyager did awhile back, it’s interesting but I think only part of the answer

80hd|7mon ago

Putting this idea out there, haven't seen anyone implement it:

Use vector embeddings to represent each task as a story, an abstraction of 1. the past, 2. the present, 3. the future - on a kind of global "story map".

Each embedding would be generated by all available sense inputs at a point in time. The most useful embeddings alg will be able to combine sight, hearing, internal monologue, visual imagination etc into one point on a high-dimensional map.

At each time step, find the closest successful "memory" (based on embedding of 1+2+3) and do some LLM exploration to adapt the memory to the new, novel situation.

Attempt the new "story", and do something like A* to get closer to the desired "future", tweaking the story each time and plotting failed attempts on the embedding map.

Theory being that over time, the map will become populated with successful attempts and embedding will be able to abstract between similar situations based on 1+2+3.

I'm not the guy to implement it, and I imagine new models training with a "reasoning step" are doing a similar thing at training-time.

johnsutor|7mon ago

Interesting idea. Similarly, recent work appears to have used MCTS to explore sequential multi-agent systems (see https://arxiv.org/abs/2410.10762, https://arxiv.org/abs/2410.17238).

bongodongobob|7mon ago

What do you mean by a story? Like a book?

80hd|7mon ago

Story in the sense that we understand everything (perhaps even our most fundamental perceptions) through stories - events described over time with meaning/significance ascribed to particular things. There's a beginning, middle and end - in its most basic form.

If we model "situations" in AI in a similar way, my intuition tells me it would be similarly useful.