Panel Recording

Beyond the hype: The realities and risks of artificial intelligence today

Name: Beyond the hype: The realities and risks of artificial intelligence today
Start: 2025-10-02T18:00:00+01:00
End: 2025-10-02T19:00:00+01:00
Location: Chatham House and Online

Join us for an event that doesn’t just ask what AI could become, but what it already is and why it should matter to you.

Event date and time: 2 October 2025 — 18:00 TO 19:00 BST

Event location: Hybrid — Chatham House and Online

— Join us for an event that doesn’t just ask what AI could become, but what it already is and why it should matter to you.

Artificial Intelligence is no longer science fiction - it is here, powerful and advancing faster than most realise.

From large language models that can generate human-level text to algorithms shaping economies and market decisions to support development and fielding of military capabilities with increased mass, survivability and lethality, AI may already be shifting the balance of global power. Cutting through the sales pitches, what is the technology capable of now, but that we’re not yet seeing? Are policy makers taking the situation seriously, and if not, why not? What will tomorrow look like? And what does this mean for the world we live in?

This event is a chance to hear from engineers from a leading AI lab and emerging tech policy experts to cut through the noise and confront the reality of today’s AI ability.

Together, they will explore the breakthroughs that are redefining geopolitics, the risks of miscalculation in a high-stakes technological arms race, the gap between what AI can do and how prepared global societies are to manage it and who are the geo-political losers in a race to AGI.

Key questions to be discussed during this discussion include:

What is AI capable of now, but that we’re not yet seeing?
Where will its use be felt most forcefully across the international system?
Is access to powerful AI a race? What are the prizes for coming first, second or third?
Where, if anywhere, might we expect AI governance?

By registering for this event, attendees agree to our code of conduct, ensuring a respectful, inclusive, and welcoming space for diverse perspectives and debate.

Speakers

Event chaired by Alex Krasodomski.

Alex Krasodomski

Director, Digital Society Programme, Chatham House

Simon Biggs

Core Researcher, Technical Staff, Anthropic

Dr Seraphina Goldfarb-Tarrant

Head of AI Safety, Cohere

Alex Krasodomski

My name’s Alex. I’m the Director of the Digital Society Programme here at Chatham House. I’m really thrilled that you guys will be joining us today for this discussion around the, sort of, the state of artificial intelligence in 2025. As a reminder, we are, contrary to being at Chatham House, on the record today, and if you are joining us around the world, please do, sort of, share your thoughts with us on Twitter or Bluesky using the #CE_Events, and if you’re joining us online, please use the Zoom Q&A function for when we get to questions and answers.

Now, the frenzy around AI seems to have, sort of, reached a new high here in the UK, with the state visit of Donald Trump and a barrage of, sort of, new multi-billion-dollar investments in AI announced, new data centres, AI growth zones, investments in UK AI companies. But away from the, sort of, the fancy parties and the fanfare, another conversation was, sort of, reaching a fever pitch and that is the conversation that I’d like us to focus on today.

In the same 24 hours that Air Force One landed at Stansted, Eliezer Yudkowsky, a, sort of, known AI Researcher, ‘doomsayer,’ some would say, his new book was released and it didn’t pull any punches with the title, “If Anyone – If Ev – If Anyone Builds This, Everyone Dies.” Again, sort of, raising the idea that we are building something – if we carry on down this path, we’re, sort of, heading towards catastrophe, that somehow we are just years away from a technology that’s going to be somehow so powerful that it will upend everything we know, for better or for worse, about the world.

But that isn’t the only story that we are hearing about AI, and at the same time, voices are coming out say – that suggest that current approaches to AI might be hitting their limits. Voices like American Psychologist Gary Marcus, if anyone follows him on Twitter, an indefatigable Twitter warrior, who has been arguing for years that large language models may not be the path towards, sort of, super intelligence. Even folks like Sam Altman or Mark Zuckerberg or Palantir’s Alex Karp have suggested that we might be in a bit of an AI bubble, that we might be ‘overexcited’.

So, who is right, and is there a third option, one in which the science doesn’t change that much? But we keep throwing grist at the mill, more money, more processes, more energy, more time, more training. What happens if the science doesn’t change, but we just scale up and up and up? Do – what happens to this technology by, let’s say, 2030? And what’s going on here really matters, countries, companies, are betting big on AI. There are eye-watering sums of money going into this, and without a sense of where the technology is at today, it is tricky to assess what those bets might be, particularly when countries are betting on things like transforming their economies or revolutionising their public service, or on AI perhaps conferring some, kind of, strategic competitive advantage with other countries around the world.

Now, we are here this evening to answer that question, to build up to what some people call our ‘situational awareness’, I mean, what the hell is going on? And I can’t think of two better people to have this conversation with this evening. Dr Seraphina Goldfarb-Tarrant is Head of Safety at Cohere, a machine learning PhD and former Research Engineer. Simon Biggs is a Core Researcher at Anthropic, leading work looking ahead to where AI might be in the near future, particularly around whether it can automate tasks. Seraphina, let’s start with you. It is 6:04pm exactly on the October 2nd 2025, how should we be thinking about frontier machine learning today?

Dr Seraphina Goldfarb-Tarrant

Yeah, okay. So, I mean, I predominantly work on safety, which is all of the things that could potentially go wrong, but I mean, I – that usually is informed by a combination of the capabilities and the deployment context, actually, which I think matter potentially more than other things. Like, I think risks always have to be situated within a deployment context.

Alex Krasodomski

And a ‘deployment context’ here is what?

Dr Seraphina Goldfarb-Tarrant

A deployment context here is, like, you know, are you using it to power, you know, your airline? Like, most agentic benchmarks that are used nowadays are actually retail and airlines. Like…

Alex Krasodomski

Hmmm hmm.

Dr Seraphina Goldfarb-Tarrant

…the standard agentic benchmark, people know it to be, like, is my model a good agent? Is…?

Alex Krasodomski

So, is my plane going to crash or am I going to lose a customer?

Dr Seraphina Goldfarb-Tarrant

No, not plane crashing. Autopilot’s already pretty good, but that’s just physics. That’s, “Can I change my ticket to Baltimore?”

Alex Krasodomski

Okay, thanks.

Dr Seraphina Goldfarb-Tarrant

Versus – or I should use something – I don’t know why I used Maryland, but yeah, yeah, so that’s – which actually I think is interesting since that’s not always that close to what we’re really automating in an enterprise context, but it’s – it has a bunch of nice properties of a good benchmark. But yeah, so deployment context is, like, are you trying to – well, it could be, are you trying to return an airline ticket or are you driving the plane? I suppose to use your example, but – so, like, inherent risks and inherent trustworthiness of the AI will be dependent on that context, yeah.

Alex Krasodomski

So, could you help me and – understand, on the one hand, when we think benchmarks around how you might change your ticket to Maryland, right, that doesn’t sound like something that’s very complicated or difficult to do, and yet, I see reports that AI on other benchmarks is outperforming some of the best Mathematicians that are in the world, some of the best Coders in the world. How do I – how do we square that circle?

Dr Seraphina Goldfarb-Tarrant

Yeah, so, basically, that’s a really good question, and actually, the really simple thing that you’re talking about, the airline and shopping benchmark, most models actually are still pretty bad at. I mean, they’re, like, kind of, okay, but from a security standpoint or something, their failure rate is, I don’t know, the best models get, like, 70%, which is really not that great if you’re, like, worried about which place you’re flying to. I don’t think most of you would accept a 30% chance of ending up somewhere else.

Alex Krasodomski

Yeah.

Dr Seraphina Goldfarb-Tarrant

But yeah, so – whereas as far as that, it’s, like – so I think it’s basically, how constrained the test set is, but also how constrained the humans are in the test set.

Alex Krasodomski

Hmmm hmm.

Dr Seraphina Goldfarb-Tarrant

So, there’s basically, this – I – this from a Cognitive Scientist, Molly Crockett, but she frames a lot of things as – she talks about ‘techno-optimism’ as being ‘partly human pessimism’ and so – ‘cause you can take – like, for instance, there were a whole bunch of benchmarks on where AI models outperformed Researchers like me in how good the ideas they could come up with, how many they could come up with in a period of time. But the human was sat in a room, like, in front of a computer terminal writing down research ideas, which is not the way most Researchers actually do research. Most Researchers do research by being in a room with other people and, like, asking them about their results and coming up with ideas, and other things like that.

And so, there is this way in which sometimes in order to create, like, a fair setup, you’ve constrained the human in a way that’s not actually super ecologically valid, and so, I think some of that is that. And some amount of it is that a lot of the Mathematician – I mean, I think a lot of the Mathematician stuff, or, like – you know, it was actually quite a long time ago that a lot of these models passed the LSAT.

Alex Krasodomski

And LSAT here meaning?

Dr Seraphina Goldfarb-Tarrant

Oh, like, the test for being a Lawyer.

Alex Krasodomski

Right.

Dr Seraphina Goldfarb-Tarrant

Or to be admitted to law school, I believe that is.

Alex Krasodomski

Okay.

Dr Seraphina Goldfarb-Tarrant

You all definitely know way more about law than me. So – but, you know – so, you basically, have two elements there. You have the, sort of, human pessimism element of that you’ve constrained the humans so much to be outside of their, like, embodied world…

Alex Krasodomski

Right.

Dr Seraphina Goldfarb-Tarrant

…that they can’t perform the way they normally would. And then – whereas, the task of – you know, most people, I think, who work in rebooking your airline flights do not feel their full human creativity is being realised in that setting.

Alex Krasodomski

Sure.

Dr Seraphina Goldfarb-Tarrant

So, I think that particular setting doesn’t require that as much, although it still requires more than these models can do on average.

Alex Krasodomski

Right, and that’s the point I’m making.

Dr Seraphina Goldfarb-Tarrant

Yeah.

Alex Krasodomski

So, there’s clearly a difference with the state-of-the-art between what you’re, sort of, saying is the ‘constrained’ environment and the, sort of, the contact with the real world, a difference between passing an exam versus doing science.

Dr Seraphina Goldfarb-Tarrant

Exactly.

Alex Krasodomski

Is that right?

Dr Seraphina Goldfarb-Tarrant

And that’s, sort of – yeah, and that’s, sort of, the fact that, like, exams are proxy metrics…

Alex Krasodomski

Right.

Dr Seraphina Goldfarb-Tarrant

…to some extent, right? And proxy metrics have – are valid to the extent that they predict some real-world capabilities.

Alex Krasodomski

Yeah.

Dr Seraphina Goldfarb-Tarrant

So, you hope that an exam for being a Lawyer will predict your ability to practice law, even though those two settings are relatively different. And there’s an interesting question that’s been rising of, like, the fact that we’ve designed these things to be predictive of human capabilities, but humans do not have infinite memory. Humans have not read five billion hours of data, generally, no matter how much less you sleep than me. And so, there’s all these, like, human capabilities that mean that – for instance, like, I think a really good example is the ability to paraphrase is used…

Alex Krasodomski

Right.

Dr Seraphina Goldfarb-Tarrant

…in a lot of tests as an example of understanding. Like, if you say something to me, and then I can repeat it back to you in a different way, you go, “Great, she understood.” Whereas, actually, language models are basically trained to paraphrase. Like, that’s what a language model does. Like, even very bad language models back, like, you know, ten years ago, could do, like, a pretty good paraphrase, because that’s fundamentally what – that’s what the basics of learning language is, and so that’s, like, not a good test for capability in a language model. And so, I think there’s a really interesting, sort of like, emerging area for science of, like, what is a good predictive test for a human is not necessarily a good predictive test for a model.

Alex Krasodomski

Fine.

Dr Seraphina Goldfarb-Tarrant

But there’s not enough work in that yet.

Alex Krasodomski

Simon, I saw you, sort of, smiling and nodding there, like, so this – there is – we ha – we seem to have on our – available to us online, a superintelligence in some respect that can somehow pass – well, that’s probably a loaded word, but an incredible intelligence that can answer questions in ways that I would previously – it can pass exams. Seems to be able to compete with the world’s leading Coders, could compete with the world’s leading Scientists, in some respects, but at the same time is struggling with what sound like some quite basic tasks. How do we square that circle in thinking about the impact that this is going to have on the economy?

Simon Biggs

Yeah, and thank you, Alex, for having me here. Well, I’m a – I see, essentially, what you have – you have this, sort of, dual world where you have these models – you have these tasks which you can run to full autonomy, and then you have these other tasks where there are moments where they fall over. And essen…

Alex Krasodomski

And what’s an example of a task that we can currently think about as, like…

Simon Biggs

Hmmm.

Alex Krasodomski

…performing almost fully autonomously with…

Simon Biggs

Yeah.

Alex Krasodomski

…really high success rate?

Simon Biggs

Yeah, so an example that we have begun to see – so, me being on the Frontier Red Team, I tend to see the – well, I tend to look over the horizon, or look into the abyss, depending on what day it is, and one of the things is that we – one of the things we have been seeing is that people are using this in a cyber offense scenario.

Alex Krasodomski

Hmmm hmm.

Simon Biggs

And so being able to – someone who is – who was not able previously to at scale, scam people or look at lots of data on what could make someone vulnerable or respond, they are then using this and seeing, hey, I can actually find a way to at scale, find weaknesses in how a person is – maybe their finances or something like that…

Alex Krasodomski

Yeah.

Simon Biggs

…and be able to we – find a wedge. And that person would have not have been able to do it by themselves previously, and this is actually in the real world, and we need to be able to stop these as an AI industry.

Alex Krasodomski

And so, this idea that there are certain things right now that we can say – so clearly, phishing is one of them. I don’t know, is machine translation an example of something that these – I mean, I understand from previous conversations that it was pretty good at that be – even before we had these latest breakthroughs, but a task that is, sort of, automatable. How do we know when something is going to move from the bucket of, you know, we’re still going to need a human to do this, take it out of that bucket and into the, this is all automatable? What is on the cusp of that bucket now as far as you guys can see it?

Simon Biggs

I really liked what Sophie said, where she said, “It really matters on the deployment,” and it’s these – it depends on the scenario. Like, how important is it if the model gets it wrong?

Alex Krasodomski

Right.

Simon Biggs

And there are some cases where it’s actually quite concerning, and so, I would say at the – as the very first low-hanging fruit, you have those ones where it’s actually okay if it’s got a 90% success rate, and you can actually scale that up.

Alex Krasodomski

So, what would an example of that be?

Simon Biggs

And things that allow you to – so a good example that we saw recently is you’re able to – if you want to be able to form questions when you’re doing – on auto – if you wanted to automatically interview all of the people here…

Alex Krasodomski

Yeah.

Simon Biggs

…I could get Claude to automatically interview all of you, and we could be – while we’re having this chat, like, Claude could be, like, asking you questions in the audience, you know. And this could be, like, quite a nice feature, for example, and that’s something that we just personally couldn’t do at scale.

Alex Krasodomski

Hmmm hmm.

Simon Biggs

Well, we’d require quite a few people, and that could be quite helpful in how this today runs, you know, for example.

Alex Krasodomski

Yeah.

Simon Biggs

And so – but these sorts of scenarios end up being everywhere when you start looking.

Alex Krasodomski

We – Seraphina, there’s an announcement that we had hundreds of – you know, billions of dollars of investment being made in UK AI, in data centres, in investments being made in UK AI companies. what does that tell us about the UK and, you know, as it stands when it comes to the state of this technology? Are we – you know, I understand that America is probably out there first and China is probably out there maybe second or first, I don’t know, you tell me, and we – are we in third place, fourth place? Does it matter? Where are we on the – from where you’re sitting?

Dr Seraphina Goldfarb-Tarrant

Yeah, that’s really interesting. It’s interesting that you talked about ‘machine translation’ as well…

Alex Krasodomski

Yeah.

Dr Seraphina Goldfarb-Tarrant

…because the early machine translation was all in the UK and Europe. And I think some people – I think people who are on the tech side haven’t forgotten this, because still, I would say a huge amount of the strongest natural language processing, so language AI, which – frequently, I was looking at this event and I was like, oh, we’re just talking about NLP, actually, we’re talking about na – only language AI. But – whereas, like, AI used to mean, like, you could be talking about, like, logistics, right? Like…

Alex Krasodomski

Sure.

Dr Seraphina Goldfarb-Tarrant

…are you – am I helping your Amazon delivery truck get to where it’s going? Or something along those lines. But yeah, so people who are on the tech side haven’t really forgotten that. I think a lot of other people have, especially because of the dominance of a couple of US AI labs, but yeah, but – so as far as, like, second or third or where the UK is, yeah, I guess it, kind of, depends on what you’re counting by, which isn’t that clear of an answer. But I think as far as, like, a huge amount of the development, a huge amount of the development of NLP is still in the UK and France as far as tech talent and so, everyone has had to expand there very quickly just because cultures work that way. People move there for labs and for universities and stuff like that, and so that’s definitely still the case here. I mean, I’m in the UK because I moved here for that reason, and I have been here for some time. So, yeah, but as far as, like, the position in the race, I think if you’re trying to count by, I don’t know, sort of, like, the top ten, then the UK is quite clearly in there.

Alex Krasodomski

Hmmm hmm.

Dr Seraphina Goldfarb-Tarrant

If you’re trying to count by the top couple, it’s less clear, but I think the top couple change really frequently depending on your metric. Like – and I don’t mean this to be like, oh, everything is relative, but I mostly mean this to be, like, you know, are you talking about – if you’re talking about – like, frequently people who are worried about the, you’re all going to die if you build this thing, is, like, catastrophic capabilities. You’re talking about, like, raw compute enormous models and stuff like that. I am not in that camp of people, but that is what they are talking about. Whereas, if you’re talking about, let’s say, you know, used to – ability to transform some – an economy, you’re actually talking about probably enterprise usage, which is broadly what I work with. And there’s a lot of concerns with enterprise usage where it’s, sort of like, it doesn’t really matter if you’re, like, technically the third best model if you’re a lot faster and cheaper.

Alex Krasodomski

Hmmm hmm.

Dr Seraphina Goldfarb-Tarrant

Because people’s tolerance to wait 15 seconds for the most powerful model to answer their email query is quite low in practice. I don’t know if you guy – any of you guys, like, remember – or any time when – you know, if you’ve been in a situation when you have bad Wi-Fi, right, like, your tolerance for, like, loading something is really quite short.

Alex Krasodomski

Yeah.

Dr Seraphina Goldfarb-Tarrant

And so, yeah, so I think it’s, sort of – it really depends on what metric you’re optimising for as far as how high you are, but I think – I genuinely – like, I am in the UK because I think the UK is pretty good as a…

Alex Krasodomski

Simon…

Dr Seraphina Goldfarb-Tarrant

…player there.

Alex Krasodomski

…do you see the UK’s prospects in the same way, we are, you know, punching third, or punching fourth, fifth?

Simon Biggs

I would certainly say that as time goes on, well, I – compute, human talent and energy…

Alex Krasodomski

Yeah.

Simon Biggs

…are the main pieces of this puzzle. I do…

Alex Krasodomski

You described those earlier as ‘bottlenecks’.

Simon Biggs

Yes.

Alex Krasodomski

So, talent is a bottleneck, energy is a bottleneck.

Simon Biggs

And the chips.

Alex Krasodomski

And the chips themselves.

Simon Biggs

The hardware, and the actual – but – and today, you could probably think of a data centre as a big computer. So, like, it’s the chips and being able to make them all connect, you know.

Alex Krasodomski

Right.

Simon Biggs

Yeah, it’s – those are the key bottlenecks, and so, certainly when an area has good human talent, that is certainly helpful in this current regime. Right at this very moment, I mean, I personally suspect – what I have been seeing is that as each model comes out, the thing – the number of tasks that can be fully automatable start going up, and the number of tasks where there is a place where the model falls over goes down, and I have not seen that trend change, and so – we go…

Alex Krasodomski

Well, I was going to ask you just on the – this idea of bottlenecks, is every country facing the same bottlenecks? Is the UK peculiar in the fact that it faces these bottlenecks on energy, on talent and on compute?

Simon Biggs

No, so it’s – I mean, these are, sort of – these are the general bottlenecks where if…

Alex Krasodomski

Yeah.

Simon Biggs

…a country can’t meet all three of these, then they’re in trouble. So, if a certain coun…

Alex Krasodomski

And how’s the UK doing?

Simon Biggs

I suspect we need a lot more ability to have electricity and compute.

Alex Krasodomski

Right.

Simon Biggs

And I think the world would be in a much better place if that – if those capabilities – I mean, I’m from Australia. I’m a little bit biased, so I’d also like Australia to also have those capabilities, but it would be really, really nice if we had some democratic countries that had those in there…

Alex Krasodomski

And when…

Simon Biggs

…a lot of those.

Alex Krasodomski

And Serphina, if I can just pick up on this idea of the bottlenecks that might be, sort of, getting in the way. I really want us to, sort of, think about Frontier AI here. Are they bottlenecks that are going to stop enterprise adoption? Are they the bottlenecks that are going to stop the people in this audience, you know, using these technologies, using these tools? Or are these bottlenecks on research, on advancing this technology into something that is new and exciting and powerful? What – where are we go – where might we feel the bottlenecks? Let’s say that the UK doesn’t manage to connect its new data centres to the grid fast enough, how will that – what does that look like for the UK?

Dr Seraphina Goldfarb-Tarrant

I mean, I think it’s both. So, I mean, the UK not connecting its data centres is, like, a problem for the UK economy, but not necessarily for UK AI labs. Like, I mean, it’s nice if we have local data centres, but, like, people buy data centres everywhere, and there’s lots of reasons why they might have to, due to different, like, laws in different jurisdictions. Like, sometimes we make some sorts of training choices, like, we train on – we literally will train on a data centre in the United States versus in Canada versus somewhere else because of local laws in something like that. Everyone does that, and so, like, as in that is a capability everyone has.

It might be nice ‘cause it might be cheaper, and so that would be non-trivially different, right? It will enable startups. It obviously keeps money more local, which is very good for economies, but I don’t think it fundamentally changes, sort of like, how powerful a UK AI lab has the capability of being. The cheque you’re – that someone’s going to have to write for that is going to be enormous, regardless. It would be nice if it were smaller, but it will be enormous, even if it’s not – if it’s outside the UK, and then the UK could still have a sovereign model. But – so I don’t think that materially, like – again, I think it has a big impact on the economy. I don’t think it has a big impact on the power of the AI model.

As far as the actual bottlenecks that I see for, like, ener – I think, yeah, I think that is a bottleneck for, like, research power, for sure. Compute is a huge bottleneck, compute and electricity are really the same, fundamentally, we’re talking about the same thing. So, those are big bottlenecks as far as research. I feel like – I think the bottlenecks that I see for enterprise adoption are pretty different, actually. I think they’re more around – I don’t think they – I don’t think you can – ‘cause compute is just fuel, right? And so, it’s, like, if you’re trying to make a faster or, like, better car of some type, you can just shove more rocket fuel into it and it will go faster, but that might not be super-efficient.

Alex Krasodomski

Hmmm.

Dr Seraphina Goldfarb-Tarrant

Or you can change the shape of the car to make it more aerodynamic in a variety of ways, right? And that often is more complicated, but – and so, you get a huge amount in our industry right now of just attach a rocket to the back, which is what compute is, and you can’t get away from adding fuel. I’m not suggesting we don’t need more compute. I most days wish I had more compute, genuinely, because I’m – I am usually bottlenecked on that, as far as, like, my team and stuff like that, and my company. But I think there’s actually a lot more smart things we could do as far as the shape of the car, as far as adoption, because of the things that we’ve talked about. That I think people are not used to the fact that, like, most of the enterprise use cases have, like, somewhat low tolerance for certain types of errors, and we don’t, for instance, in machine learning, have easy ways of, like, categorising, like, critical errors versus other errors and weighting different types.

And what, like – a pretty good – like, I was saying this with, like, what is – you know, would any of you have a 30% tolerance of flying to the wrong city? And that’s, like, a pretty low risk case, actually…

Alex Krasodomski

Yeah.

Dr Seraphina Goldfarb-Tarrant

…compared to, like, a lot of, you know, like, deployments that we have in finance, in medicine, and in other stuff like that, and what I would consider to be – and, again, 70% is like, kind of, an okay grade in machine learning on balance. 95% is a great grade in machine learning. It’s still not a super great grade in a lot of medical contexts. A lot of you would not love a 5% chance of a lot of medical things failing.

So, I think there’s, sort of like, a lack of understanding there and of collaboration with, like, the best ways to handle the fact that this is, like, an inherent property of machine learning. And so, you have to build things to accommodate that there will always be, compared to other types of tech advances that other – that ha – that the economy has absorbed previously, there will always be a different type of risk of failure than people are used to.

Alex Krasodomski

Simon, I’m going to come to you on this. So, I’m hearing from you guys a perspective here that I think we don’t necessarily always hear when we are being told about how AI might change the world. I’m hearing that on some – on tasks there might be a failure rate that to the folks in this room would be unacceptable. I mean, frankly, a 95% chance of being taken to the right place by – you know, if I booked a flight would be unacceptable to me, as well. That’s not the story that we are hearing when it comes to how bullish it feels, like, industry and frankly, the UK Government is on its rollout of AI across public services or within industry. Why do we – why do you think we’ve got that gap? And is this something that will be, do you think, fixed over time, or is the idea that we just need to get out there and do it? Why is there this rush, as it were, from where you’re sitting?

Simon Biggs

Yeah, I mean…

Alex Krasodomski

Simon, yeah.

Simon Biggs

So, I see, I think one thing to clarify here is that sometimes that last 5% can be solved with a very simple check.

Alex Krasodomski

Hmmm hmm.

Simon Biggs

Sometimes the appropriate check is a human in the loop. Sometimes the appropriate check is, like, a standard interlock, or a standard system. And so, if you do look – and so this – having an AI solve 95% of it and having this, like, very basic bit of code solve the last 5%…

Alex Krasodomski

Or a human.

Simon Biggs

Or a human, yeah, now, all of a sudden, you’re at 100%. And that’s what I mean, like, with a bit of – so with a bit of innovation, those last five percents could be only solved by model, maybe, or maybe only solved by scaffolding, or maybe, like, could be solved by scaffolding today and will be solved by a model tomorrow. But, like, there are still tasks that are 100%, it’s just you have to look at the system.

Alex Krasodomski

Understood.

Simon Biggs

And so, I think that the real metric that, like, that we are seeing is how many are at 100%? And, like, you end up with this, like, groundswell of nothing, it all just – it can’t even, like, spell words, you know, like – and then, all of a sudden, like, this – it just starts to just swell up. And I personally just do not see any moment along that swelling up where you cannot take a system and take it to full 100% for most tasks.

Alex Krasodomski

So, folks who are saying, “Hey, look, this” – you know, “it can’t even book a flight,” well, sure, but it can take you almost all of the way and then finish it off. And suddenly the ramifications of that would be enormous for the economy, if suddenly, like, 80% of every task is automatable, or 70% of every task is automatable. Is that what I’m understanding from where we are at currently and how these agents – how these AIs might be being deployed in our economy today or tomorrow?

Dr Seraphina Goldfarb-Tarrant

Yeah, I think it depends on the task, but I think probably the reason it has resonated with, like – like, the bit – the truth that there is behind the hype is that, like, no person loves, you know, reading over, like, 100 different documents and extracting some data for – from it and summarising it for another person. Or, like, doing all of the, like, you know, admin tasks of, like, changing your schedule when you have to change all of your meetings and other stuff like that.

And those sorts of things are, like – and again, you know, as Simon says, it’s, like, if you have, like, verifiable outputs, you know – so, like, as in, presumably, if you had an agent that booked you to – you know, that changed all of your schedule so that your meetings were in Manchester, you would notice and not accept that, like, hopefully. It depends on how you set it up, but, like, that’s part of the system thing of, like, you have to set up a system that makes that easy to view and correct. But – and so, yeah, so I definitely think, like, there’s a huge amount of, like – like, these models are going to be very, very, very good at going over an enormous amount of data and extracting important parts of it and presenting it in a more – and there are a huge amount of human tasks that fit into that framework of a task.

Alex Krasodomski

Hmmm hmm.

Dr Seraphina Goldfarb-Tarrant

Like, extracting huge amounts of data, doing action, or just, like, relatively closed domain actions, right? It’s, like, one of the – again, I’m going by, like – I’m anti-human pessimism. Like, a lot of the things that humans are really good at is, like, extreme generalisation from being able to – you know, like, the fact that, like, that presumably a huge amount of people here can, like, drive and – or, like, ride a bicycle or something along those lines. It’s, like – and then you can take some of – and those are all skills that you’ve generalised from other motor functions and other, you know – and you also have a lot of cognitive capabilities that way. Like, I can tell – you know, people, like, call it ‘zero shotting,’ but people, like, adopt new tasks very easily.

But there’s a lot of tasks that everyone does in their daily life that do not require that, like, rearranging your calendar and stuff like that. And little bits of them will, which is why there’s only 95%, because, you know, there’s actually quite a lot of random uniqueness in rescheduling your calendar, but, like, not that much, and so if you can take a lot of it away, that’s quite powerful. I think there is still some truth in there being some things that you cannot actually get to because of the architecture, and this is where it’s, like – I think anyone who’s really way, way, way too loud on either side of the spectrum, if you’re talking about Yudkowsky or Gary Marcus, is oversimplifying things to some extent. But it’s, like, the angle on the Gary Marcus thing is, like, that there are architectural things we have built into the way we have made models that mean that they will not be very good at certain things.

Alex Krasodomski

Yeah.

Dr Seraphina Goldfarb-Tarrant

Like, facts is something that they are not – that is not how language works. Language is probabilistic, and so, like, the fact that, like, there is 100% probability that I am Seraphina Goldfarb-Tarrant is, like, not something they’re super great at representing, in general. Like, so a language model confusing me with you is, like, always going to happen to some extent. So, there’s some things language models I do not think will bubble up and will not become good at, but there’s a huge amount of tasks that do not require that.

Alex Krasodomski

And it’s already very good at.

Dr Seraphina Goldfarb-Tarrant

And it’s already very good at those tasks, yeah.

Alex Krasodomski

Thank you. I will spare you the question on whether we should be Yudkowskyites, where if anybody builds it, we’ll all die, or whether we should be more sanguine. I will turn to the audience now for any questions, and I’ll take some of these questions online. We have a lot of interest. If you do have a question, please do put your hand up. Anything on the – on Front – on the questions that we’ve tackled today or on Frontier AI more broadly, I’ll come to you. Just let us know who you are and keep it quite – we’ll start over here at the front, and then I’ll come round. Hi. Oh, and a mic will be brought to you. Thank you.

Michael Pugh

Hi, Michael Pugh, Chatham House member, and Lecturer in law at St Mary’s University. So, naturally I’m going to ask about regulation and thinking about your title, safety, and so on, is it over-regulated, under-regulated? Do we – what do we need in the regulation area? Thank you.

Alex Krasodomski

Great question. Thank you. We’ll go just in the middle there. Hi, and then we’ll come over to you for one more. We’ll take three, if that’s okay.

Tess Buckley

Tess Buckley, Programme Manager in Digital Ethics AI Safety at Tech UK. Wanted your guys steer on something keeping me up at night post-AI Action Summit, pre-UK-US trade deal, the securitisation of safety. We saw that, you know, renaming in safety to security. I’ll pause there, but just the, kind of, rhetoric, semantics, or a shift itself.

Alex Krasodomski

Brill, fabulous, and one more question here at the front, and then I’ll come – we’ll come back to my panel. Thank you.

Paul Lee

Thank you. My name is Paul Lee. I’m at Deloitte and I do research into technology. So, my question is, Seraphina, you mentioned LLMs are good at various things, so my question is, what are they good at and they’re better than humans at a lower cost? So, what is commercialisable and what is scaling and going beyond a proof of concept? Thank you.

Alex Krasodomski

Thanks, Paul. Okay, so a question there at the end around scaling and what LLMs currently are at least as good as humans are at. A question around the, sort of, securitisation of safety and the AI Safety Institute, now the AI Security Institute, seeing similar in the US, and a question there about regulation, maybe in this country more broadly, are we getting it right? Too much, too little? Yeah, would you like to – maybe we could start with the regulation question, Seraphina, ‘cause it does touch on your job title and…

Dr Seraphina Goldfarb-Tarrant

Sure.

Alex Krasodomski

…then, Simon, I’ll come to you.

Dr Seraphina Goldfarb-Tarrant

Yeah, well, you know, as ment – as I early said, every – you all probably know more about law than me, still true, but yeah, from a regul – but I have talked to – you know, I was talking about the EU AI Act really early, actually, and then a whole bunch of other stuff like that. Yeah, I think it’s tricky in that regard. I think something that I don’t love about the regulation I have seen is that it’s specific in the wrong ways. which maybe is not news to you. I mean, but, like, the FLOPS thing is dumb, but…

Alex Krasodomski

So, this means basically – explain to us what – ‘cause it does – it counts the F – counts how many FLOPS, basically…

Dr Seraphina Goldfarb-Tarrant

Yeah.

Alex Krasodomski

…and that is currently the measure being used. Really big model, you can count it using FLOPS.

Dr Seraphina Goldfarb-Tarrant

Yeah, it’s – I mean…

Alex Krasodomski

That’s not very good.

Dr Seraphina Goldfarb-Tarrant

Yeah, I’ve said this too many times and I’ve lost this battle, clearly. But I mean, basically, you’re basically, being, like, how many steps of math have you done? And then you’re regulating based on how many steps of math have you done? And that’s not a good way to regulate, for a lot of reasons. Again, as not a law person, but as a tech person, like, one of the reasons it’s not a great way is that it might – is that compared to, especially the pace of AI nowadays, law is very slow, so you want something that won’t be out of date super soon. I mean, the EU AI Act happened pretty fast, but, still, it’s still pretty slow.

I mean, you end up with lots of – like, I think from previous technology stuff, you know, you have, like, some security standards, at least from back when I worked on, you know, some safety-ish stuff in the United States many, many years ago, like, you know, there’s some weird US laws around coding languages that just mean that, like, government programmes don’t work very well nowadays. Because they made the law, like, way too specific about, like, what you could do with a browser in, like, 1980 and, like, you really don’t want to end up with something like that.

On the other hand, I do definitely think, like, there is – like, there’s a significant amount of – like, the pressure for – as much as – and I work at a Frontier lab, right? So, like – and I really care about societal harm and stuff like that, but I still definitely do agree that, like, I think some – like, I think the EU AI Act was too specific in certain technical details that are not the right ones. But I do think the fact that they chose to do something is a good idea, specifically for the reason that, you know, we will put, especially when there’s a race, we will put resources where there is market pressure to do so. And market pressure is not completely uncorrelated with societal good, but it’s definitely not perfectly correlated with societal good.

And so, to some extent, you need something that fills the gaps that market pressure doesn’t have, and even some of the market pressure that exists exists because of regulation. Like, the couple of times that I’ve been forced contractually – I had actually already done it, but the – but that I’ve been forced contractually to make a safety implementation was usually – it was actually because of old regulation around, like, discrimination law and around, like, transparency in financial insider trading and stuff like that, right? In these cases, I had actually, like, usually already implemented something like that because I cared but, like, I was glad that I had to. And so, that’s – and so I think that will always be – there will always be that interplay, and so, I think the trick will be around setting the – around doing it so that it’s not so specific as to be out of date too quickly or over much a burden in ways that don’t relate to real risk.

Alex Krasodomski

So, regulation, yes, but perhaps not in the way that we’re currently conceptualising of it, that’s…

Dr Seraphina Goldfarb-Tarrant

Yeah.

Alex Krasodomski

Yeah, that’s – and I think that’s a very live debate in the UK at the moment, right? Like, we are clearly – we’ve been promised an AI bill. The question is, what will it contain? Simon, can I ask you about the question from Paul around, like, again, sort of, capturing what is – what are LLMs capable of? What are they now, at a, sort of, human level? And I suppose an additional question from me, what’s baked in? Like, there must be things that we – that are currently live in laboratories like yours that we haven’t seen yet. I mean, maybe you can’t tell us, but maybe you could at least give us a hint.

Simon Biggs

So, you mean – so the question about how the price of deploying it and running it is more effective – is – than – no, that’s – so what was the question, sorry?

Alex Krasodomski

Paul.

Paul Lee

It was about applications…

Simon Biggs

Yeah.

Paul Lee

…which are good enough to be scaled in a commercial environment.

Simon Biggs

I mean, I would say this is a bit out of my wheelhouse, but…

Alex Krasodomski

Oh, I mean, I suppose if you – is it you’re looking for the next million-dollar idea?

Dr Seraphina Goldfarb-Tarrant

I mean…

Paul Lee

And techno…

Dr Seraphina Goldfarb-Tarrant

…I listed, like – so one of the things that, like – which does touch on what Simon was saying about, like, whether you can make up the following 5%, like, so some amount of this is up to the design of embedding AI into a system, right? But, like – but, basically, like, the things that are good enough are usually things in which someone can verify. Like, one of the things that, like I said, like, that really changes people’s lives is, like, retrieval and summary of information. So, if you’re, like, “What are the 15 emails I got from Chatham House today?” and, like, “What are the important things in them?” That’s good enough because you will know if it’s – if there’s an error, presumably, and your tolerance for error is also probably pretty good. You know, in that, like, you probably roughly know what sorts of things you’d be emailed about. So, if something – if there’s a light error in it, then it’s, kind of, okay, and it can save you an enormous amount of time.

So, I think, like – I mean, that’s just one example, there’s quite a lot because they’re general-purpose models. But I would say, in general, anything in which the human is not seeking new knowledge but is in some way retrieving and formatting things, because then they can spot if there’s an issue, and it’s the sort of thing that would take a human actually a really very long time of…

Alex Krasodomski

Right.

Dr Seraphina Goldfarb-Tarrant

…not a lot of high value stuff.

Alex Krasodomski

And it might sound vague, but that does actually describe a huge amount of the jobs that we do.

Dr Seraphina Goldfarb-Tarrant

Yeah.

Alex Krasodomski

Simon.

Simon Biggs

Something that I would love somebody to build is there’s this thing called ‘cast analysis’, where if an incident occurs, as a human who’s, like, on the ground trying to solve a problem, you try and find the root cause, and then once you’ve found it, you’re, like, “I’m done, I’ve done my job, I’m walking away.”

Alex Krasodomski

Hmmm.

Simon Biggs

But there is so much more information you can get out of – ‘cause problems are not trees, they’re cycles. Like, so when something goes wrong, it’s actually, like, a cycle in a big system, right? And so, there’s, like, causes all through the cycles. Claude, or AI models, can use, like, these well des – so we have, like, standard systems in safety approaches that are well documented, you follow this approach. Claude can look at every single incident that has, like, ever happened in your, in your company, and it can, like, dig in, and go, like, “Hey, I’m going to apply this cast analysis approach, and, like, you can verify that it went through all the steps, ’cause, like, here is” – and it’s, like – and this has been a way that’s, like, shown, hey, when you do this and you do it rigorously and you do it over the whole system, like, you actually start to capture new things.

And it’s not that you’re – and I think the key thing here that when it’s deployed, you deploy it as an extra layer of, here is just something we as a human – a company, could have never, sort – like, found this, like, weird way in which our company was failing.

Alex Krasodomski

Right.

Simon Biggs

But because we had this thing at scale, like, digging into everything that’s going wrong and, like, extracting the best information out of incidences and, like, sh – and using – and looking at it from a bigger picture, like, all of a sudden, you’re actually just strictly value adding. It’s okay, like, if it misses something, it would be nice if it didn’t miss anything, but, like, it’s strictly a plus…

Alex Krasodomski

In…

Simon Biggs

…verse never deploying it.

Alex Krasodomski

I’d just, like, to finish with the question from Tess there around the securitisation of safety.

Dr Seraphina Goldfarb-Tarrant

Hmmm.

Alex Krasodomski

This is clearly something – I mean, I think you can take this – this question is quite broad. I know that my colleagues at Chatham House have been looking at the, sort of, growing closeness between some technology firms and Departments of Defence, but also this – sort of, some of the re-jigs that we’ve – you know, the AI Safety Institute is now the AI Security Institute. How do you, as the Head of AI Safety at Cohere, how do you, sort of, see this? I mean, I hear you’re – I suppose you’re not the Head of Security now, you’re still the Head of Safety.

Dr Seraphina Goldfarb-Tarrant

Yeah, well, I…

Alex Krasodomski

What’s happening there?

Dr Seraphina Goldfarb-Tarrant

I mean, I think of it as predominantly geopolitically motivated, right?

Alex Krasodomski

In what kind – in what sense?

Dr Seraphina Goldfarb-Tarrant

I mean, I think it’s motivated in the sense that a number of things that were previously – I think there’s a couple of reasons. I think the bad – the reasons that I do not love, are the ones where it has to do with a number of the things that were previously under the remit of safety are not as geopolitically popular as they used to be.

Alex Krasodomski

Okay, safety is woke?

Dr Seraphina Goldfarb-Tarrant

Yes, exactly.

Alex Krasodomski

Right.

Dr Seraphina Goldfarb-Tarrant

And so, some – and so the parts that’s, like, not so great is it’s a move away from that, and that’s, like, weird because it’s not, like, any of those problems went away. It’s not, like, suddenly we ceased having as much of a problem with misinformation or discrimination or anything along those lines, obviously. But then the part of it that I think is positive is – and I’m genuinely not trying to pull punches with this, is that I think it came with slightly more groundedness in, like, current problems. And that there had been a view, not always for me, because, like, I came from a background where I was, like, we’re doing enterprise deployments, I’m worried about problems that are, like, literally occurring right now, not something that might hypothetically happen in future. But safety, broadly, was more, like, hypothetical future focused previously, and then the rebrand to ‘security’ frequently made it – like, unified those two camps of people who were worried about what could happen in future, but also people who were like, “No, no, no, this is, like, cybercrime right now,” or something along those lines. And so, I think the downside is the, like, anti-wokeness, but then the upside is the, like, more groundedness…

Alex Krasodomski

Right.

Dr Seraphina Goldfarb-Tarrant

…of some of the previous work.

Alex Krasodomski

I’m going to take another round of three questions. I’m going to go to – let’s start with Emma at the front, and then I’ll go over there next. Thanks.

Emma Ross

Thanks. Emma Ross, Chatham House staff. Can I go back to the realities today and ‘beyond the hype’? I’m still not clear, as Alex said, what we hear is this is going to totally upend our lives, transformative, whether it’s in the way medicine works, we’re all – nobody’s going to want to employ us ‘cause a robot can do it. That is significant changes to life. What I’m hearing described here isn’t anywhere near – doesn’t seem to me to be near on the scale of impact on ordinary people’s lives. Do we have anything to share about where – what is the reality or what is hype as to what this is doing now to our lives and the way we, as humans, exist and live and can expect to earn a living, and, etc., and maybe you can’t future gaze, but, you know, where it’s going?

Alex Krasodomski

Thanks, Emma. We’ll just go behind you, the lady in the back row there. I’ll take one more here at the front.

Catherine Elliott

Hi, I’m Catherine Elliott, I’m Chief of Staff for a company delivering multi-agent teams to secure enterprises. I wanted to go back to what you guys were saying about error weighting and, kind of, LLM evaluation. And when we look at it in an agent context, I’d love to get your view on what companies who are providing agents into enterprises should be thinking about when it comes to agent evaluation, mostly from a safety and ethics perspective.

Alex Krasodomski

And I think it might be helpful just maybe we’ll clarify what we mean by an agentic deployment, because I think that could mean anything, honestly. We’ll finish up…

Dr Seraphina Goldfarb-Tarrant

True.

Alex Krasodomski

…here, if that’s okay, and then we’ll take – but then we’ll come back to the crowd.

Dr Fola Yahaya

Fola Yahaya from Strategic Agenda. We work with some UN agencies, and we look at AI and a lot of their digital policies. To answer your question about what’s happening with AI on the ground, how it’s affecting people, and also to touch on what you mentioned about translation, our business began as a translation company 20 or so years ago, and we translated most of the UN’s major reports. But what’s happened, obviously, with machine translation and now AI, is not necessarily that AI is better than translation, but it’s a perception that AI can now – has now solved translation. And I think this is one of the risks of AI, it’s a perception issue.

So, for example, we have Deloitte asking about, “How can I commercialise this technology and how can I essentially replace jobs or tasks with AI?” And so, most companies are now going through that process of thinking, which tasks and eventually, which humans can I replace with this? And that is the problem, ‘cause AI is not quite good enough yet at most things, but it will soon be, but people are already asking that question. I think the second thing, also, the other risk is…

Alex Krasodomski

I’ll ask you to come to a question. I think that was a really good…

Dr Fola Yahaya

Oh sorry, sorry.

Alex Krasodomski

…comment, but if we’re going to…

Dr Fola Yahaya

Sorry. Okay, so the – well, the question I had was actually your question, which was, what is baked in? What is – you know, what are the models that we aren’t seeing? Because, you know, I use Claude 4.5, and it’s amazing, but I know there are much more complicated and better…

Alex Krasodomski

Yeah.

Dr Fola Yahaya

…models there, so that’s another one.

Alex Krasodomski

Yeah, you got to tell some secrets. Brilliant, thank you.

Dr Fola Yahaya

Yeah.

Alex Krasodomski

Well, let’s – that’s a great place to think. So, Emma, to Emma’s question, what is reality versus what is hype? We, like, we – I – we still don’t have an answer to that question. A question around agentic deployments, and perhaps a little explanation on it, and Simon, we might come to you on that, it’s, like, what exactly we mean by an ‘AI agent’? And then, yeah, lastly, this question of what’s in the – what’s coming down the line?

Dr Seraphina Goldfarb-Tarrant

I mean, I think as far as what’s reality and what’s hype, I think the two things you’re talking about are not mutually exclusive. In that, at least when I think about risk landscapes, a lot of the ways I think of it as similar to, like, what we did wrong as a tech community with the release of social media. Like, not fully, but – and I – the reason I’m bringing that up is that I’m, like, that changed the world in an enormous amount of ways. Like, entire national uprisings, like the Arab Spring and stuff happened, only because of it. It has changed the way a lot of political systems have worked, and stuff like that, and it was also not that fancy, like, you know, being, like, you can post text online, like, great. You know, and people have, like, nice emojis that make people want to engage with them, and it engages some sort of dopamine process in, like, especially younger humans, but all humans, really.

And so, I mean – and AI is – like, modern LLMs are much fancier, but that still changed the world because it changed, sort of, the way people interact in their daily lives, and I think we have something similar, both as far as potential for change and also, potential for things to go wrong. But yeah, so, like, as in, like, I think it can change the world as far as, like, commercial – you know, sort of, the amount of time that it takes me to look for information in Turkish, or something along those lines, definitely. But I mean, I think it will also change things because when it changes the way, like, you know, the way that I see, like, high school students look up information nowadays, and that sort of thing, is, like, it’s going to change the world for those sorts of reasons, as well. and it doesn’t have to be super, super powerful, it just has to be quite good at the stuff it is right now.

Alex Krasodomski

Can we have a very brief explanation of what an agentic AI deployment might be? How does that differ? My understanding is it’s actually quite similar, but just a longer task, is that right? How would you explain it, Simon?

Simon Biggs

Yeah, and before I get to that…

Alex Krasodomski

And we’re going to hear a lot about agentic AI…

Simon Biggs

Yeah.

Alex Krasodomski

…next year, I would assume.

Simon Biggs

Is to add to that, like, it has completely changed what I do and how I work. Like, I will be very easily more than ten times myself, like, it – depending on the task. So, in some tasks I will, like, grind to a halt and, like, have to, like, go at, like, this very slow pace, and then all of a sudden, I step into this new mode and I am running at a very different pace. But that’s just a – and…

Dr Seraphina Goldfarb-Tarrant

But it has not very much for what I do, even though we’re – we both work in tech. So – but it’s partly because for most of the things I do, the cost of verifying whether something is correct is higher than of creating it from scratch, so…

Simon Biggs

And so, then the question about AI agents. I actually don’t, like, the word ‘agents’.

Alex Krasodomski

Hmmm.

Simon Biggs

Because we’ve put a construct, we’ve humani – we’ve taken this human concept, and we’ve applied it to these models.

Alex Krasodomski

Hmmm.

Simon Biggs

When in reality, a single model is able to, like, copy half of itself across to, like, another model and, like, you can have, like – anyway, I will not dig too deep. Essentially, I think what most people say when they mean AI agents is actually taking actions in the world.

Alex Krasodomski

Is it a model that takes an action in the world?

Simon Biggs

I would think that’s what most people are meaning.

Alex Krasodomski

So, to give us an example from earlier, a model you would – that might book us – book me a flight or book me a table at a restaurant?

Simon Biggs

Yeah, or autonomously go out and, like, find people to phish.

Alex Krasodomski

Or do something bad, absolutely. I’ll – December 2026, when are we each going to have access to this technology? When is it going to be something that we have to eng – we have to take to our…?

Simon Biggs

Well, it’s back to what I was, sort of, saying before, in the sense that it’s – you have this, sort of, dual world, where there are tasks that can go to 100% because the system is able to be built in a robust way…

Alex Krasodomski

Yeah.

Simon Biggs

…such that you can trust them, or the task itself is – it can be trusted to them. And there’ll be other tasks where, for whatever reason, it’s just – it hasn’t – it’s at 99% and that 1% ma – really matters. So, we are going to have this scenario where, like, you are just having this rising tide of some tasks being completely automated and we will start to think of it, like, ho-hum, and some tasks we’ll still be thinking, holy moly, these things are stupid.

Alex Krasodomski

Right.

Simon Biggs

But it will have an absolute ins – it’ll be an – this is just going to keep rising until there is nothing left, in my opinion.

Dr Seraphina Goldfarb-Tarrant

I disagree that there’ll be ‘nothing left’, but I do think one of the interesting points to add on that, though, as far as, like, to expand on what you’re saying, as far as you were talking about to autonomously go out and find people to phish, is that, like, the – that I think people don’t talk about that often, is that the tolerance for that 5% error or something is going to be different if you’re doing something like cybercrime versus if you’re not. Which means that, you know, you – if you don’t have a tolerance for the model potentially misbooking your flight, you might not use it, but someone probably who’s trying to use it to phish people probably has a tolerance for it failing a couple of times.

Alex Krasodomski

Hmmm.

Dr Seraphina Goldfarb-Tarrant

And so, in a weird way, it, like, can sometimes enable – can move the needle on malicious activity sometimes faster than on, like, standard, more boring, but more economically helpful, enterprise activity, and I don’t think people talk about that that often.

Alex Krasodomski

Can we finish by the final question at the front, what’s coming down the line? What are you guys brewing back there? Is there something that is baked into current paradigms, scaling paradigms? There is the investment in AI that has already been committed, the chips are being bought, chips are being shipped, the compute is running, what can we expect? What’s baked in…

Simon Biggs

Yeah.

Alex Krasodomski

…in 2026/27?

Simon Biggs

The scaling laws will continue.

Alex Krasodomski

Until morale improves.

Simon Biggs

And we will do another – and, yeah, and I – and we need to be aligned. We need to have these models and we need a, sort of, alignment too.

Dr Seraphina Goldfarb-Tarrant

I don’t think the scaling laws will continue, because I was here last time people said that in 2010.

Simon Biggs

They are continuing, they’ve been going happy.

Dr Seraphina Goldfarb-Tarrant

They give significantly diminishing returns compared to what they used to, but that doesn’t mean that models won’t still be getting better. I think…

Alex Krasodomski

Yeah.

Dr Seraphina Goldfarb-Tarrant

…they’ll just get better for different reasons.

Alex Krasodomski

Like what?

Dr Seraphina Goldfarb-Tarrant

Because we’ll get smarter about specifically ways we want to train them. We’ll get smarter about – like I said, I think, like, if I’m talking about barriers to interesting types of adoption, I don’t think – I think some of them are capability-based, but I think a lot of them are based on, like, intelligent ways of putting verification in. And that’s, like, not super sexy, but I think that’s actually frequently, like, the next thing that you need to get it somewhere. Yeah, of course, it’s like certain different types of, like, tool use-based capabilities, you do just need more capable models that can do generalisation better.

So, it, sort of – it just depends, it depends on whether you want to be able – yeah. But I don’t – I am not a scaling maximalist, and I – yeah, because back when I first worked on machine translation, people were scaling maximalists back then and they just said, “You need to shove more of the internet into the model.” And back then we were using what we called ‘n-gram models’ which nowadays is quite laughable if anyone suggests using them. But you used to also be, like, shoved off of a stage if you suggested using a different architecture back then, ‘cause people just said, “No, you need more data.”

Alex Krasodomski

Right.

Dr Seraphina Goldfarb-Tarrant

And I think it depends on what you want, but more data can’t solve all of our problems. It will get us better generalisation frequently, but not everything.

Alex Krasodomski

I leave this conversation with, I think, this question around risk and risk appetite, and this idea that this technology, which is already extraordinarily powerful, even in contexts where it can’t do 100% of the job that we’re asking it to, but it could perhaps do 95/96, and this question around, well, how – that the kind – it’s actually less about the technology and more the context in which we’re going to be deploying these technologies and how our risk appetite to it might change. Perhaps we decide that we want – we are able – you know, we’re willing to accept a fail rate of 5% when we apply the AI, if it saves us a certain amount of money in a given public service. And I think that the question for the folks in the audience today is to think about, well, let’s assume a 95% success rate, are there contexts where we don’t accept that as a potential error?

And I think it’s clearly – if you look around the world today, there are technologies being deployed in conflict scenarios or in fragile political systems, where actually, we would expect that that fail rate is actually far too high for these to be deployed, but they are being deployed anyway. And actually it is – while the technology is, in and of itself, an important and pa – an important thing, it’s as important is to say, “Well, what are you actually asking the technology to do?” And that, ultimately, is a very human question and a very – and a question I think a lot of people here, be it, legal, societal, economic, I think are going to face in their day-to-day bay – days.

Thank you so much for taking the time to speak with us this afternoon. Can I ask for a round of applause from my audience [applause] and today’s been – thank you. Thank you again for joining us. There – please do check the Chatham House website for future events. My understanding is for anyone who’s got a bit of an interest in the digital and the technical, I think we’ll be welcoming the President of Estonia to join us on Monday. So, please do come to hear all about how Estonia has transformed its digital services. I think particularly of interest given the announcement that the UK Government has made around a potential UK ID system being rolled out. So, we’ll hear it from the country that many say have done it the best. So, I hope you can – we can see you there, otherwise, see you later on. Thanks, everyone.

Simon Biggs

Thank you.

Event format and who can attend

Panel sessions bring together several renowned speakers with different perspectives on an issue, offering the audience a thought-provoking discussion that thoroughly analyses a topic from many angles. They are held in our Joseph Gaggero Hall and on the record.

Members and guests get priority access, with places available to those who register their interest.

By registering for this event, attendees agree to our code of conduct, ensuring a respectful, inclusive, and welcoming space for diverse perspectives and debate.

Topic themes

Defence and security

Economics and trade

Environment

Health

Institutions

Major powers

Politics and law

Society

Technology

Regions

Africa

Americas

Asia-Pacific

Europe

Middle East and North Africa

Russia and Eurasia

Foreign policy and international affairs events

Publications

Chatham House membership and partnerships

The Queen Elizabeth II Academy for Leadership and the Next Generation

About us

Beyond the hype: The realities and risks of artificial intelligence today

Speakers

Alex Krasodomski

Simon Biggs

Dr Seraphina Goldfarb-Tarrant

Event format and who can attend

Become an individual member

What would a ceasefire in Ukraine mean for Europe and the world?

Annual General Meeting

US at 250: Separation vs. concentration of power – America’s enduring constitutional debate

Competition Policy Conference 2026

Chatham House Berlin conference 2026

Beyond the hype: The realities and risks of artificial intelligence today

Event video

Speakers

Alex Krasodomski

Simon Biggs

Dr Seraphina Goldfarb-Tarrant

Event format and who can attend

Become an individual member