grunge
Well-Known Member
they had stock of those GPUS and are using those aren't they?
they used older GPU's yep. but there is a lot more to it by the look of it. using less precise prediction allowed for less VRAM to be used making it more manageable on lower end GPU's. also instead of making 1 very large LLM model that "knows everything" they made many smaller LLM's with specialities which were quicker to train.
plus a bunch of other changes.. this seemed to be a decent video on it.