view post Post 1627 wow 😮INTELLECT-1 is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code. PrimeIntellect/INTELLECT-1-Instruct
view post Post 5051 Hey it was good meeting you yesterday @MaziyarPanahi 🔥thanks @mishig for setting this upLet's make the Hub as useful as possible for the community ❤️
Canonical models This collection lists all the historical (pre-"Hub") canonical model checkpoints, i.e. repos that were not under an org or user namespace albert/albert-base-v1 Fill-Mask • Updated Feb 19 • 14.3k • 8 albert/albert-base-v2 Fill-Mask • Updated Feb 19 • 6.33M • 115 albert/albert-large-v1 Fill-Mask • Updated Feb 19 • 1.51k • 3 albert/albert-large-v2 Fill-Mask • Updated Feb 19 • 25.5k • 17
Papers about model merging referenced in the mergekit repo: https://github.com/cg123/mergekit Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time Paper • 2203.05482 • Published Mar 10, 2022 • 6 Editing Models with Task Arithmetic Paper • 2212.04089 • Published Dec 8, 2022 • 6 Resolving Interference When Merging Models Paper • 2306.01708 • Published Jun 2, 2023 • 13 Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch Paper • 2311.03099 • Published Nov 6, 2023 • 28
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time Paper • 2203.05482 • Published Mar 10, 2022 • 6
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch Paper • 2311.03099 • Published Nov 6, 2023 • 28