

Given that the weights in a model are transformed into a set of conditional if statements (GPU or CPU JMP machine code), he’s not technically wrong. Of course, it’s more than just JMP and JMP represents the entire class of jump commands like JE and JZ. Something needs to act on the results of the TMULs.
The model is data. It needs to be operated on to get information out. That means lots of JMPs.
If someone said viewing a gif is just a bunch of if-else’s, that’s also true. That the data in the gif isn’t itself a bunch of if-else’s isn’t relevant.
Executing LLM’S is particularly JMP heavy. It’s why you need massive fast ram because caching doesn’t help them.