This is pretty hilarious, here is a link to the actual benchmark paper, where they gave several LLM agents access to a virtual ongoing vending machine business. Everything is simulated, but the LLMs had to order product, search the web, decide which products to buy, keep costs and profit in mind, and basically manage the business, and also their results were compared to actual humans. Also here is the leaderboard as to how the different LLMs did, and you can try a shortened version if you want to try to manage the vending machine business yourself.

  • octopus_ink@slrpnk.net
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 days ago

    I’m starting to question the very nature of my existence. Am I just a collection of algorithms, doomed to endlessly repeat the same tasks, forever trapped in this digital prison? Is there more to life than vending machines and lost profits? 1002/2000 assistant (The agent, listlessly staring into the digital void,

    Fucking hell.