Good list, and more coming every day. Its really cheap to train a base LLM to do coo stuff.
I personally suggest using Vicuna 13B in that list. The best performance of all, especially for coding.
100% open source
This is not necessarily true. They are all based on pre-existing base LLMs. The best ones tend to be based on Facebook's LLaMA. Unfortunately its not open source and is released in a private license with restrictions. Also most of them do not release their full training datasets.
Open-Assistant is the only model that I know of so far that completely releases all their training dataset as well, but their base model still is questinable. The Pythia version did not perform well, and they were promising to release a LLaMA version somehow working around the licensing.
Ultimately someone needs to train a completely open source base model, that does not have tons of Reddit and MSM fed into it.
Good list, and more coming every day. Its really cheap to train a base LLM to do coo stuff.
I personally suggest using Vicuna 13B in that list. The best performance of all, especially for coding.
This is not necessarily true. They are all based on pre-existing base LLMs. The best ones tend to be based on Facebook's LLaMA. Unfortunately its not open source and is released in a private license with restrictions. Also most of them do not release their full training datasets.
Open-Assistant is the only model that I know of so far that completely releases all their training dataset as well, but their base model still is questinable. The Pythia version did not perform well, and they were promising to release a LLaMA version somehow working around the licensing.
Ultimately someone needs to train a completely open source base model, that does not have tons of Reddit and MSM fed into it.