Got it. Yes, there are lot of optimisations possible with the parameters, and when you do some of these optimisations it would potentially have an effect on the quality of the model and quantifying that and reducing these effects is a big part of the research. The compute and energy requirements would be scaled but still linearly.
Here is my prediction - we will see a completely different base paradigm for training these models. Like valves vs transistors. When this happens, we will see an order of magnitude reduction in compute+energy usage and we will probably see multiple iterations of this.
This is what makes this timeline so amazing. For those of us who were of age at the infancy of computers/internet etc, to be able to see another epoch - even bigger than that - and be able to contribute is incredible.
Got it. Yes, there are lot of optimisations possible with the parameters, and when you do some of these optimisations it would potentially have an effect on the quality of the model and quantifying that and reducing these effects is a big part of the research. The compute and energy requirements would be scaled but still linearly.
Here is my prediction - we will see a completely different base paradigm for training these models. Like valves vs transistors. When this happens, we will see an order of magnitude reduction in compute+energy usage and we will probably see multiple iterations of this.
This is what makes this timeline so amazing. For those of us who were of age at the infancy of computers/internet etc, to be able to see another epoch - even bigger than that - and be able to contribute is incredible.
Totally agree.
And true. This kind of transformation is effectively seeing science fiction coming into being.