https://huggingface.co/collections/facebook/mobilellm-6722be18cb86c20ebe113e95

\n","updatedAt":"2024-10-31T00:35:46.595Z","author":{"_id":"660f893bae89429c07a32cdb","avatarUrl":"/avatars/27442b1dab58114cfe10220c040c1156.svg","fullname":"Zechun Liu","name":"zechunliu","type":"user","isPro":false,"isHf":false,"isMod":false,"followerCount":3}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6514444947242737},"editors":["zechunliu"],"reactions":[{"reaction":"🔥","users":["Nymbo","CarlFranzen"],"count":2},{"reaction":"🤗","users":["Nymbo","CarlFranzen"],"count":2}],"isReport":false,"parentCommentId":"65dcb82b4f54e416ef61a3c0"}}]},{"id":"65dd390ecb021a4a9ea51ef4","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isMod":false,"followerCount":143},"createdAt":"2024-02-27T01:21:18.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Head-wise Shareable Attention for Large Language Models](https://huggingface.co/papers/2402.11819) (2024)\n* [Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers](https://huggingface.co/papers/2402.11700) (2024)\n* [Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs](https://huggingface.co/papers/2402.10517) (2024)\n* [BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models](https://huggingface.co/papers/2402.11573) (2024)\n* [Rethinking Optimization and Architecture for Tiny Language Models](https://huggingface.co/papers/2402.02791) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\n\t recommend

\n","updatedAt":"2024-02-27T01:21:18.337Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isMod":false,"followerCount":143}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7431129217147827},"editors":["librarian-bot"],"reactions":[{"reaction":"👍","users":["irotem98","win10","theospeak"],"count":3}],"isReport":false}},{"id":"65dd4d55fa7b949c77f1b2ed","author":{"_id":"65b3605d4c9e50e74aa792d5","avatarUrl":"/avatars/e78a1c4b8521fa667c10c8e8fa78a201.svg","fullname":"jenna_su","name":"jennasu","type":"user","isPro":false,"isHf":false,"isMod":false},"createdAt":"2024-02-27T02:47:49.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"If it can be downloaded I would like to test it on my device","html":"

If it can be downloaded I would like to test it on my device

\n","updatedAt":"2024-02-27T02:47:49.975Z","author":{"_id":"65b3605d4c9e50e74aa792d5","avatarUrl":"/avatars/e78a1c4b8521fa667c10c8e8fa78a201.svg","fullname":"jenna_su","name":"jennasu","type":"user","isPro":false,"isHf":false,"isMod":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9824135303497314},"editors":["jennasu"],"reactions":[],"isReport":false}},{"id":"65dd7f755eea01ee95cdfce5","author":{"_id":"6450b03f5af3bded255f7866","avatarUrl":"/avatars/32d43e708d1c0816510430c688650aed.svg","fullname":"Muzaffar Ahmad Mir","name":"muzaffarahmadmir","type":"user","isPro":false,"isHf":false,"isMod":false},"createdAt":"2024-02-27T06:21:41.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2024-02-27T06:21:55.365Z","author":{"_id":"6450b03f5af3bded255f7866","avatarUrl":"/avatars/32d43e708d1c0816510430c688650aed.svg","fullname":"Muzaffar Ahmad Mir","name":"muzaffarahmadmir","type":"user","isPro":false,"isHf":false,"isMod":false}},"numEdits":0,"editors":[],"reactions":[]}},{"id":"65df02a132d7506642043171","author":{"_id":"62a6eac831a06b2439ab579a","avatarUrl":"/avatars/4a310a237e52b67b14864b82f6812d02.svg","fullname":"Wei Du","name":"weidu","type":"user","isPro":false,"isHf":false,"isMod":false},"createdAt":"2024-02-28T09:53:37.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Looking forward to trying it! Layer sharing saves only the memory, not the computation, so here is a thought on combining it with LORA: fine tune the shared layers with a low-rank update. Then you have different weights for each layer but increase little parameter number.","html":"

Looking forward to trying it! Layer sharing saves only the memory, not the computation, so here is a thought on combining it with LORA: fine tune the shared layers with a low-rank update. Then you have different weights for each layer but increase little parameter number.

\n","updatedAt":"2024-02-28T09:53:37.439Z","author":{"_id":"62a6eac831a06b2439ab579a","avatarUrl":"/avatars/4a310a237e52b67b14864b82f6812d02.svg","fullname":"Wei Du","name":"weidu","type":"user","isPro":false,"isHf":false,"isMod":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8635212182998657},"editors":["weidu"],"reactions":[],"isReport":false}},{"id":"65e020729c7ca80a040a448e","author":{"_id":"651c4fa8edc1d15d31028e62","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/651c4fa8edc1d15d31028e62/Yt-92KlsinjgNMlDAFCoQ.jpeg","fullname":"Jonathan Jordan","name":"jonathanjordan21","type":"user","isPro":false,"isHf":false,"isMod":false,"followerCount":7},"createdAt":"2024-02-29T06:13:06.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Interesting. If the findings hold true for all small LLMs, then it is very possible to cut down encoder-decoder model size by applying layer sharing to the decoder part of the model. Model size has always been an issue for encoder-decoder models.","html":"

Interesting. If the findings hold true for all small LLMs, then it is very possible to cut down encoder-decoder model size by applying layer sharing to the decoder part of the model. Model size has always been an issue for encoder-decoder models.

\n","updatedAt":"2024-02-29T06:13:06.269Z","author":{"_id":"651c4fa8edc1d15d31028e62","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/651c4fa8edc1d15d31028e62/Yt-92KlsinjgNMlDAFCoQ.jpeg","fullname":"Jonathan Jordan","name":"jonathanjordan21","type":"user","isPro":false,"isHf":false,"isMod":false,"followerCount":7}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8972679376602173},"editors":["jonathanjordan21"],"reactions":[],"isReport":false}},{"id":"65e0566123741b3dac90d08f","author":{"_id":"5e80b7d830dc073f817a2bc0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1585493970035-noauth.jpeg","fullname":"Haris Jabbar","name":"maveriq","type":"user","isPro":false,"isHf":false,"isMod":false,"followerCount":4},"createdAt":"2024-02-29T10:03:13.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Could someone reproduce a model config that would duplicate the number of parameters with number of layers, heads, key-value heads and embedding dimension, given in the paper?\n\nI used Llama config with additionally setting tie_word_embeddings=True, but I don't get the same number of parameters. Probably I am missing something?\n\nSecondly, the authors didn't mention the pretraining dataset they used. IMHO, controlling for that would be a better setup to measure the effect of model parameters. ","html":"

Could someone reproduce a model config that would duplicate the number of parameters with number of layers, heads, key-value heads and embedding dimension, given in the paper?

I used Llama config with additionally setting tie_word_embeddings=True, but I don't get the same number of parameters. Probably I am missing something?

Secondly, the authors didn't mention the pretraining dataset they used. IMHO, controlling for that would be a better setup to measure the effect of model parameters.

\n","updatedAt":"2024-02-29T10:03:13.648Z","author":{"_id":"5e80b7d830dc073f817a2bc0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1585493970035-noauth.jpeg","fullname":"Haris Jabbar","name":"maveriq","type":"user","isPro":false,"isHf":false,"isMod":false,"followerCount":4}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9045965075492859},"editors":["maveriq"],"reactions":[],"isReport":false}},{"id":"6665481947613a01ee1f2917","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isMod":false,"followerCount":70},"createdAt":"2024-06-09T06:13:45.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"# MobileLLM: Revolutionizing Efficient Language Models for Smartphones \n\nhttps://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/OtZQNyt9xTPzPtlJOvhEc.mp4 \n\n## Links 🔗:\n👉 Subscribe: https://www.youtube.com/@Arxflix\n👉 Twitter: https://x.com/arxflix\n👉 LMNT (Partner): https://lmnt.com/\n\n\nBy Arxflix\n![9t4iCUHx_400x400-1.jpg](https://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/v4S5zBurs0ouGNwYj1GEd.jpeg)","html":"

MobileLLM: Revolutionizing Efficient Language Models for Smartphones

\n\n

Links 🔗:

👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/

By Arxflix
$\"9t4iCUHx_400x400-1.jpg\"$

\n","updatedAt":"2024-06-09T06:13:45.375Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isMod":false,"followerCount":70}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.46596038341522217},"editors":["blanchon"],"reactions":[{"reaction":"🔥","users":["mtasic85","zechunliu"],"count":2}],"isReport":false}},{"id":"6722d0afecba15cf52451e24","author":{"_id":"660f893bae89429c07a32cdb","avatarUrl":"/avatars/27442b1dab58114cfe10220c040c1156.svg","fullname":"Zechun Liu","name":"zechunliu","type":"user","isPro":false,"isHf":false,"isMod":false,"followerCount":3},"createdAt":"2024-10-31T00:34:55.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"Good news! The MobileLLM model weights are now publicly available: https://huggingface.co/collections/facebook/mobilellm-6722be18cb86c20ebe113e95","html":"

Good news! The MobileLLM model weights are now publicly available: https://huggingface.co/collections/facebook/mobilellm-6722be18cb86c20ebe113e95

\n","updatedAt":"2024-10-31T00:36:05.552Z","author":{"_id":"660f893bae89429c07a32cdb","avatarUrl":"/avatars/27442b1dab58114cfe10220c040c1156.svg","fullname":"Zechun Liu","name":"zechunliu","type":"user","isPro":false,"isHf":false,"isMod":false,"followerCount":3}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.5618354082107544},"editors":["zechunliu"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2402.14905","authors":[{"_id":"65dc007f2bc96e6f004a0a4f","user":{"_id":"620ae55fe190de4b8f27ba9d","avatarUrl":"/avatars/aaa2d0eadbcf96c2eb9059e3d73c2760.svg","isPro":false,"fullname":"Liu","user":"Zechun","type":"user"},"name":"Zechun Liu","status":"admin_assigned","statusLastChangedAt":"2024-02-26T10:16:15.652Z","hidden":false},{"_id":"65dc007f2bc96e6f004a0a50","user":{"_id":"64d49ef30f76abaf363b88d6","avatarUrl":"/avatars/93b5cc51305ac88198cea1dad8104db2.svg","isPro":false,"fullname":"Changsheng Zhao","user":"mikezhaocs","type":"user"},"name":"Changsheng Zhao","status":"admin_assigned","statusLastChangedAt":"2024-02-26T10:16:22.535Z","hidden":false},{"_id":"65dc007f2bc96e6f004a0a51","user":{"_id":"5f1753f4925b9863e28ad4de","avatarUrl":"/avatars/e56d25a40f2c0e4e6696b9738877fce8.svg","isPro":false,"fullname":"Forrest Iandola","user":"forresti","type":"user"},"name":"Forrest Iandola","status":"admin_assigned","statusLastChangedAt":"2024-02-26T10:16:29.946Z","hidden":false},{"_id":"65dc007f2bc96e6f004a0a52","name":"Chen Lai","hidden":false},{"_id":"65dc007f2bc96e6f004a0a53","user":{"_id":"6344cf73ee1504dbcd5bdfe7","avatarUrl":"/avatars/6dd2bf1f9c5679e5c8c85d62c9836aac.svg","isPro":false,"fullname":"Yuandong Tian","user":"tydsh","type":"user"},"name":"Yuandong Tian","status":"admin_assigned","statusLastChangedAt":"2024-02-26T10:16:48.983Z","hidden":false},{"_id":"65dc007f2bc96e6f004a0a54","user":{"_id":"646159d6051604bda02d6f05","avatarUrl":"/avatars/d606fee0ad542cc085697b12e673b180.svg","isPro":false,"fullname":"igor fedorov","user":"igo77","type":"user"},"name":"Igor Fedorov","status":"admin_assigned","statusLastChangedAt":"2024-02-26T10:16:54.965Z","hidden":false},{"_id":"65dc007f2bc96e6f004a0a55","user":{"_id":"65304b62e7535baecd85d080","avatarUrl":"/avatars/6e546c7d1414bd92c5a7c8d8c404de92.svg","isPro":false,"fullname":"Yunyang Xiong","user":"yunyangx","type":"user"},"name":"Yunyang Xiong","status":"admin_assigned","statusLastChangedAt":"2024-02-26T10:17:03.222Z","hidden":false},{"_id":"65dc007f2bc96e6f004a0a56","name":"Ernie Chang","hidden":false},{"_id":"65dc007f2bc96e6f004a0a57","name":"Yangyang Shi","hidden":false},{"_id":"65dc007f2bc96e6f004a0a58","name":"Raghuraman Krishnamoorthi","hidden":false},{"_id":"65dc007f2bc96e6f004a0a59","user":{"_id":"64b705c73240387159397c71","avatarUrl":"/avatars/da339f07eb58f58756b91eba9af557a1.svg","isPro":false,"fullname":"Liangzhen Lai","user":"liangzhen-lai","type":"user"},"name":"Liangzhen Lai","status":"admin_assigned","statusLastChangedAt":"2024-02-26T10:19:03.248Z","hidden":false},{"_id":"65dc007f2bc96e6f004a0a5a","user":{"_id":"6566681f7b5ed0735812af32","avatarUrl":"/avatars/424905abb2973954b0850c592743b6fb.svg","isPro":false,"fullname":"Vikas Chandra","user":"vchandra","type":"user"},"name":"Vikas Chandra","status":"admin_assigned","statusLastChangedAt":"2024-02-26T10:19:24.981Z","hidden":false}],"publishedAt":"2024-02-22T18:58:55.000Z","submittedOnDailyAt":"2024-02-26T00:37:44.388Z","title":"MobileLLM: Optimizing Sub-billion Parameter Language Models for\n On-Device Use Cases","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"This paper addresses the growing need for efficient large language models\n(LLMs) on mobile devices, driven by increasing cloud costs and latency\nconcerns. We focus on designing top-quality LLMs with fewer than a billion\nparameters, a practical choice for mobile deployment. Contrary to prevailing\nbelief emphasizing the pivotal role of data and parameter quantity in\ndetermining model quality, our investigation underscores the significance of\nmodel architecture for sub-billion scale LLMs. Leveraging deep and thin\narchitectures, coupled with embedding sharing and grouped-query attention\nmechanisms, we establish a strong baseline network denoted as MobileLLM, which\nattains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M\nstate-of-the-art models. Additionally, we propose an immediate block-wise\nweight sharing approach with no increase in model size and only marginal\nlatency overhead. The resultant models, denoted as MobileLLM-LS, demonstrate a\nfurther accuracy enhancement of 0.7%/0.8% than MobileLLM 125M/350M. Moreover,\nMobileLLM model family shows significant improvements compared to previous\nsub-billion models on chat benchmarks, and demonstrates close correctness to\nLLaMA-v2 7B in API calling tasks, highlighting the capability of small models\nfor common on-device use cases.","upvotes":126,"discussionId":"65dc00802bc96e6f004a0aac"},"canReadDatabase":false,"canManageCommunity":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65dc25876d290f6b90b4bcd4","avatarUrl":"/avatars/391baa53398a5f116539caeb8f635a04.svg","isPro":false,"fullname":"Stuart So","user":"ckstuart2542","type":"user"},{"_id":"60d34de513f774189902f547","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1640713172129-60d34de513f774189902f547.png","isPro":false,"fullname":"Awsaf","user":"awsaf49","type":"user"},{"_id":"64403d8d7663594a1263fdd4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64403d8d7663594a1263fdd4/9faL_ocHf6W2Jm6vR1zWl.png","isPro":false,"fullname":"Ahmed Khalil","user":"antiquesordo","type":"user"},{"_id":"63a7422854f1d0225b075bfc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a7422854f1d0225b075bfc/XGYAcDPZG5ZEsNBWG6guw.jpeg","isPro":true,"fullname":"lhl","user":"leonardlin","type":"user"},{"_id":"6557317760cb377db04fcac7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/4t_Snhh2cehjReF_UePKR.jpeg","isPro":false,"fullname":"Pradeep Raje","user":"pradeepraje","type":"user"},{"_id":"63ddc7b80f6d2d6c3efe3600","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ddc7b80f6d2d6c3efe3600/RX5q9T80Jl3tn6z03ls0l.jpeg","isPro":false,"fullname":"J","user":"dashfunnydashdash","type":"user"},{"_id":"639ede152e13e54dcbb2e393","avatarUrl":"/avatars/356421fa957e7e71a93e2fef598f2028.svg","isPro":false,"fullname":"Anas kham","user":"Annu72772","type":"user"},{"_id":"63a166f981173de5e4cad069","avatarUrl":"/avatars/9936dea87929c7c639e60ca6f54c0ab4.svg","isPro":false,"fullname":"Jaydip","user":"its-jd","type":"user"},{"_id":"64e19d3558076dcc988e7d69","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64e19d3558076dcc988e7d69/-sio2717x2we5PMBM2wCz.jpeg","isPro":false,"fullname":"Zichen Zhang","user":"zhangzzc","type":"user"},{"_id":"64747f7e33192631bacd8831","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64747f7e33192631bacd8831/dstkZJ4sHJSeqLesV5cOC.jpeg","isPro":false,"fullname":"Taufiq Dwi Purnomo","user":"taufiqdp","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"64a6d853693c32776e627d6c","avatarUrl":"/avatars/f0c30e5c4b911fd890ea4d031594d5b7.svg","isPro":false,"fullname":"pvti","user":"pvti","type":"user"}],"acceptLanguages":["en","*"],"dailyPaperRank":1}">

Papers

arxiv:2402.14905

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Published on Feb 22

· Submitted by

akhaliq on Feb 26

#1 Paper of the day

Upvote

126

Authors:

Zechun Liu ,

Changsheng Zhao ,

Forrest Iandola ,

Yuandong Tian ,

Igor Fedorov ,

Yunyang Xiong ,

Liangzhen Lai ,

Vikas Chandra

Abstract

This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice for mobile deployment. Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our investigation underscores the significance of model architecture for sub-billion scale LLMs. Leveraging deep and thin architectures, coupled with embedding sharing and grouped-query attention mechanisms, we establish a strong baseline network denoted as MobileLLM, which attains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M state-of-the-art models. Additionally, we propose an immediate block-wise weight sharing approach with no increase in model size and only marginal latency overhead. The resultant models, denoted as MobileLLM-LS, demonstrate a further accuracy enhancement of 0.7%/0.8% than MobileLLM 125M/350M. Moreover, MobileLLM model family shows significant improvements compared to previous sub-billion models on chat benchmarks, and demonstrates close correctness to LLaMA-v2 7B in API calling tasks, highlighting the capability of small models for common on-device use cases.

View arXiv page View PDF Add to collection

Community

Bachstelze

Feb 26

It would be interesting to see a comparison to small encoder-decoder models like instructionRoBERTa or flan-T5.

Jenish-23

Feb 26

As a GPU poor I find this paper interesting and I am excited to try them out.
My questions are:

Have you guys considered Knowledge distilling Phi-2-2.7B model into smaller 350M model?
How does the design change affect the in-context learning ability of these models?
Does existing tool-chain PEFT, LORA and optimization techniques like AWQ, EXL2 and GPTQ work on these models?

ogimgio

Jul 7

Why not distilling from a larger model?

Kernel

Feb 26

Code and weight release?

zechunliu

Oct 31

The model weights are now publicly available: https://huggingface.co/collections/facebook/mobilellm-6722be18cb86c20ebe113e95

librarian-bot

Feb 27

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

jennasu

Feb 27

If it can be downloaded I would like to test it on my device

muzaffarahmadmir

Feb 27

This comment has been hidden

weidu

Feb 28

jonathanjordan21

Feb 29

maveriq

Feb 29

Could someone reproduce a model config that would duplicate the number of parameters with number of layers, heads, key-value heads and embedding dimension, given in the paper?

I used Llama config with additionally setting tie_word_embeddings=True, but I don't get the same number of parameters. Probably I am missing something?

Secondly, the authors didn't mention the pretraining dataset they used. IMHO, controlling for that would be a better setup to measure the effect of model parameters.