https://engineering.fb.com/2021/07/15/open-source/fsdp/\n- DeepSpeed's tutorial on ZeRO: https://www.deepspeed.ai/tutorials/zero/","text":"Paper on FSDP, PyTorch's implementation of ZeRO-3. \nIt addition to that, reading the following blog posts might be an easier introduction:\n- PyTorch's blog post on FSDP: https://engineering.fb.com/2021/07/15/open-source/fsdp/\n- DeepSpeed's tutorial on ZeRO: https://www.deepspeed.ai/tutorials/zero/"},"id":"2304.11277","title":"PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2304.11277.png","upvotes":1,"publishedAt":"2023-04-21T23:52:27.000Z","isUpvotedByUser":false},{"_id":"665d8dbd728bb250f492f21e","position":1,"type":"paper","note":{"html":"Initial paper on Tensor Parallelism.","text":"Initial paper on Tensor Parallelism."},"id":"1909.08053","title":"Megatron-LM: Training Multi-Billion Parameter Language Models Using\n Model Parallelism","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/1909.08053.png","upvotes":2,"publishedAt":"2019-09-17T19:42:54.000Z","isUpvotedByUser":false},{"_id":"665d8dca957df09a07552203","position":2,"type":"paper","note":{"html":"To read after the Megatron-LM paper, it provides an improvement compared to vanilla Tensor Parallelism called \"Sequence Parallelism\" which consists in sharding the activations on the sequence axis outside of the Tensor Parallel regions mostly to save memory.","text":"To read after the Megatron-LM paper, it provides an improvement compared to vanilla Tensor Parallelism called \"Sequence Parallelism\" which consists in sharding the activations on the sequence axis outside of the Tensor Parallel regions mostly to save memory."},"id":"2205.05198","title":"Reducing Activation Recomputation in Large Transformer Models","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2205.05198.png","upvotes":0,"publishedAt":"2022-05-10T22:40:17.000Z","isUpvotedByUser":false},{"_id":"665d8dd7d892e3815d471a51","position":3,"type":"paper","note":{"html":"Initial paper on Pipeline Parallelism.","text":"Initial paper on Pipeline Parallelism."},"id":"1811.06965","title":"GPipe: Efficient Training of Giant Neural Networks using Pipeline\n Parallelism","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/1811.06965.png","upvotes":0,"publishedAt":"2018-11-16T18:43:28.000Z","isUpvotedByUser":false}],"position":0,"theme":"purple","private":false,"shareUrl":"https://huggingface.co/collections/michaelbenayoun/distributed-training-665d8d5d0f35c005de9c3b6e","upvotes":0,"isUpvotedByUser":false}],"datasets":[],"models":[{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":43},"downloads":3870,"gated":false,"id":"michaelbenayoun/llama-2-tiny-4kv-heads-4layers-random","inference":"not-popular-enough","lastModified":"2024-10-14T14:23:24.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":43},"downloads":459,"gated":false,"id":"michaelbenayoun/t5-tiny-random","inference":"not-popular-enough","lastModified":"2024-10-10T14:01:34.000Z","likes":0,"pipeline_tag":"text2text-generation","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":43},"downloads":5,"gated":false,"id":"michaelbenayoun/llama-2-tiny-4kv-heads-2layers-random","inference":"not-popular-enough","lastModified":"2024-05-07T15:36:13.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":43},"downloads":4,"gated":false,"id":"michaelbenayoun/llama-2-tiny-4kv-heads-8layers-random","inference":"not-popular-enough","lastModified":"2024-05-03T15:01:45.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":43},"downloads":1036,"gated":false,"id":"michaelbenayoun/llama-2-tiny-4kv-heads-16layers-random","inference":"not-popular-enough","lastModified":"2024-03-14T09:45:33.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":43},"downloads":1024,"gated":false,"id":"michaelbenayoun/llama-2-tiny-16layers-random","inference":"not-popular-enough","lastModified":"2024-01-09T14:05:36.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":43},"downloads":7,"gated":false,"id":"michaelbenayoun/llama-2-tiny-16layers-32kv-heads-random","inference":"not-popular-enough","lastModified":"2024-01-04T16:14:26.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":43},"downloads":8,"gated":false,"id":"michaelbenayoun/gpt-neox-tiny-4layers-random","inference":"not-popular-enough","lastModified":"2024-01-04T15:37:36.000Z","likes":0,"pipeline_tag":"feature-extraction","private":false,"repoType":"model","isLikedByUser":false},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":43},"downloads":668,"gated":false,"id":"michaelbenayoun/mistral-tiny-4layers-8kv-heads-random","inference":"not-popular-enough","lastModified":"2023-11-09T10:46:23.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]},{"author":"michaelbenayoun","authorData":{"_id":"6047a3315da6ba4b1dfb9e18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","fullname":"Michael Benayoun","name":"michaelbenayoun","type":"user","isPro":false,"isHf":true,"isMod":false,"followerCount":43},"downloads":94,"gated":false,"id":"michaelbenayoun/llama-2-tiny-4layers-random","inference":"not-popular-enough","lastModified":"2023-11-06T09:42:19.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[]}],"numberLikes":28,"papers":[],"posts":[],"totalPosts":0,"spaces":[],"u":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615890856777-6047a3315da6ba4b1dfb9e18.png","isPro":false,"fullname":"Michael Benayoun","user":"michaelbenayoun","orgs":[{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","userRole":"write","type":"org","isHf":true},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653062536500-5e9ecfc04957053f60648a3e.png","fullname":"Hugging Face Internal Testing Organization","name":"hf-internal-testing","userRole":"admin","type":"org","isHf":false},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65c2710dc79c1a6e4d22734d/kB7OTGVsC1DPMIV2femsf.png","fullname":"Qualcomm","name":"qualcomm","userRole":"contributor","type":"org","isHf":false},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641d367e45487810d13800ca/40TPYJA9S2kxHqRLawrIs.png","fullname":"AWS Inferentia and Trainium","name":"aws-neuron","userRole":"write","type":"org","isHf":false},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653061054662-5ff5d596f244529b3ec0fb89.png","fullname":"Hugging Face Optimum","name":"optimum","userRole":"admin","type":"org","isHf":false},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653306672419-5dd96eb166059660ed1ee413.png","fullname":"HF Canonical Model Maintainers","name":"hf-maintainers","userRole":"write","type":"org","isHf":false},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1659521200179-5e48005437cb5b49818287a5.png","fullname":"BigCode","name":"bigcode","userRole":"contributor","type":"org","isHf":false},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60a551a34ecc5d054c8ad93e/zj4Vyk5keZrNRfy1wWR4D.png","fullname":"Paris AI Running Club","name":"paris-ai-running-club","userRole":"read","type":"org","isHf":false},{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e67c47c100906368940747e/QKH5mbtZH_GuoR-cpv9kl.png","fullname":"Hugging Face Machine Learning Optimization","name":"hf-ml-opt","userRole":"write","type":"org","isHf":false},{"avatarUrl":"https://www.gravatar.com/avatar/b2b92654970640f6225d02fb6fc48239?d=retro&size=100","fullname":"Optimum Internal Testing","name":"optimum-internal-testing","userRole":"admin","type":"org","isHf":false}],"signup":{"github":"michaelbenayoun","details":"","homepage":"","twitter":"michaelbenayou1"},"isHf":true,"isMod":false,"type":"user"},"upvotes":1,"repoFilterModels":{"sortKey":"modified"},"repoFilterDatasets":{"sortKey":"modified"},"repoFilterSpaces":{"sortKey":"modified"},"numFollowers":43,"numFollowing":13,"isFollowing":false,"isFollower":false,"sampleFollowers":[{"user":"florentgbelidji","fullname":"Florent Gbelidji","type":"user","_id":"620a77b7dbba8fc1fbb8bdb4","isPro":false,"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620a77b7dbba8fc1fbb8bdb4/ZRW2pH9Iawj700OyLpJl8.png"},{"user":"orion-penner","fullname":"Orion Penner","type":"user","_id":"64f6f09b5f2dee8a6ba2ad08","isPro":false,"avatarUrl":"/avatars/8b1d0c420eb3a110c6563a534a33fdbb.svg"},{"user":"fffiloni","fullname":"Sylvain Filoni","type":"user","_id":"61868ce808aae0b5499a2a95","isPro":false,"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61868ce808aae0b5499a2a95/F6BA0anbsoY_Z7M1JrwOe.jpeg"},{"user":"Q-bert","fullname":"Talha Rüzgar Akkuş","type":"user","_id":"63da3d7ae697e5898cb86854","isPro":false,"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1675246771355-noauth.jpeg"}],"isWatching":false,"hardwareItems":[{"sku":["Apple Silicon","-","Apple M3 Max"],"mem":96,"num":1}],"acceptLanguages":["en","*"]}">