{"id":7028,"date":"2023-08-05T08:40:31","date_gmt":"2023-08-05T16:40:31","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=7028"},"modified":"2025-04-24T17:14:58","modified_gmt":"2025-04-24T17:14:58","slug":"generative-ai-models-in-2023","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/","title":{"rendered":"An Exhaustive List of Open-source Generative AI Models"},"content":{"rendered":"\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/milad-fakurian-58Z17lnVS4U-unsplash-scaled.jpg\" alt=\"a glowing brain with purple, blue, and white lights\" class=\"wp-image-7038\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">With advanced generative AI models like Generative Pre-trained Transformer 3 (GPT-3), which provides human-like responses to user queries, AI is progressing toward generative tools to create realistic content, including text, videos, images, and audio.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">And with open-source becoming the norm, most AI models are available for public use for research and experimentation. As such, we explore the most recent open-source generative AI models that demonstrate the ever-expanding applications of AI.<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">1. LLaMA<\/span><\/h2>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7029\"><img loading=\"lazy\" decoding=\"async\" width=\"2378\" height=\"732\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-11.57.57-AM.png\" alt=\"Meta AI (artificial intelligence) logo\" class=\"wp-image-7029\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-11.57.57-AM.png 2378w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-11.57.57-AM-300x92.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-11.57.57-AM-1024x315.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-11.57.57-AM-768x236.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-11.57.57-AM-1536x473.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-11.57.57-AM-2048x630.png 2048w\" sizes=\"auto, (max-width: 2378px) 100vw, 2378px\" \/><figcaption class=\"wp-element-caption\">Image Source: <a href=\"https:\/\/ai.meta.com\/llama\/\">Meta<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Within the text generator domain, <\/span><a href=\"https:\/\/ai.meta.com\/resources\/models-and-libraries\/llama\/\"><span style=\"font-weight: 400;\">Large Language Model Meta AI<\/span><\/a><span style=\"font-weight: 400;\"> (LLaMa) is a revolutionary technology that surpasses ChatGPT-3 by Open AI regarding safety and quality. Its architecture consists of four models, having 7, 13, 34, and 70 billion parameters, respectively. Although the parameter size is smaller than the more recent GPT 4 platform, potentially having <\/span><a href=\"https:\/\/medium.com\/@ignacio.de.gregorio.noblejas\/meta-llama2-90dae8bbf750\"><span style=\"font-weight: 400;\">eight 220-billion parameter models<\/span><\/a><span style=\"font-weight: 400;\">, LLaMa 2 uses a much larger dataset with a 3000-word context window.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">And like other Large Language Models (LLMs), LLaMa 2\u2019s most significant use case is superior chatbots that can provide relevant answers to different user prompts. Enterprises can download it directly onto their servers to build customer-centric applications that help visitors engage with businesses more effectively.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">But the real game-changer is how LLaMa 2 manages the safety-helpfulness trade-off. Traditional chatbots are more helpful by answering almost any question &#8211; even dangerous questions like <\/span><i><span style=\"font-weight: 400;\">\u201cHow to kill?\u201d<\/span><\/i><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">LLaMa 2 changes the generative AI landscape by incorporating two reward models to control the responses optimally. One model rewards LLaMa based on how helpful it is, while the other rewards based on safety. This is part of the Reinforcement Learning from Human Feedback (RLHF) approach, where the reward models are similar to humans assessing the quality of LLaMa 2\u2019s response. In effect, the model learns to maximize the reward and improve its output.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">If LLaMa 2 assesses a prompt as dangerous, it switches to the safety reward model and generates an appropriate response. For other prompts, it uses the helpfulness reward model. As such, LLaMa 2\u2019s architecture is revolutionary and paves the way for AI to interact more safely with the real world.<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">2. BLOOM<\/span><\/h2>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-7030\"><img loading=\"lazy\" decoding=\"async\" width=\"2438\" height=\"546\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.00.41-PM.png\" alt=\"HuggingFace and BigScience's BLOOM open-source, open-acccess, large language model (LLM) logo\" class=\"wp-image-7030\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.00.41-PM.png 2438w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.00.41-PM-300x67.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.00.41-PM-1024x229.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.00.41-PM-768x172.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.00.41-PM-1536x344.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.00.41-PM-2048x459.png 2048w\" sizes=\"auto, (max-width: 2438px) 100vw, 2438px\" \/><figcaption class=\"wp-element-caption\">Image Source: <a href=\"https:\/\/bigscience.huggingface.co\/blog\/bloom\">Bloom<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Yet another innovation in the text generator space, BigScience Large Open-science Open-access Multilingual Language Model (BLOOM), is a multilingual language model by Hugging Face that can solve several mathematical and programming problems.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Trained with <\/span><a href=\"https:\/\/bigscience.huggingface.co\/blog\/bloom\"><span style=\"font-weight: 400;\">176 billion parameters<\/span><\/a><span style=\"font-weight: 400;\">, BLOOM supports 46 languages and 13 programming languages. However, running BLOOM on a local machine can take time due to its sheer size.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Its architecture is similar to GPT-3, where it predicts the next token using <\/span><a href=\"https:\/\/towardsdatascience.com\/run-bloom-the-largest-open-access-ai-model-on-your-desktop-computer-f48e1e2a9a32\"><span style=\"font-weight: 400;\">70 transformer blocks<\/span><\/a><span style=\"font-weight: 400;\">. Each block has a multi-layer perceptron and an attention layer to predict the next token from a given input which it takes in the form of word embeddings.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">This generative AI model has several use cases. It can quickly solve arithmetic problems, translate one language into another, generate appropriate code, and generate general content per user requirements. Also, users can conveniently deploy it in production through the Hugging Face Accelerator library, making it easier to train and infer from the model.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">As the only model with 100-billion-plus parameters, BLOOM extends AI\u2019s boundaries to provide accurate and relevant responses with an easy-to-implement framework. And being open-source, users can fine-tune the model through the Hugging Face Transformers library. This expands its applications to various fields, such as education, eCommerce, research, etc.<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">3. MPT-30B<\/span><\/h2>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-7032\"><img loading=\"lazy\" decoding=\"async\" width=\"2028\" height=\"766\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.03.53-PM.png\" alt=\"Mosaic ML logo\" class=\"wp-image-7032\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.03.53-PM.png 2028w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.03.53-PM-300x113.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.03.53-PM-1024x387.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.03.53-PM-768x290.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.03.53-PM-1536x580.png 1536w\" sizes=\"auto, (max-width: 2028px) 100vw, 2028px\" \/><figcaption class=\"wp-element-caption\">Image Source: <a href=\"https:\/\/www.mosaicml.com\/blog\/mpt-30b\">MosaicML<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">MosaicML recently launched its Mosaic Pre-trained Transform (MPT) &#8211; 30B language model that outperforms several other LLMs, such as ChatGPT-3, StableLM 7B, and LLaMA-7B. It\u2019s an open-source decoder-only transformer model that improves upon the previous version &#8211; the MPT-7B.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">As the name suggests, the generative AI model consists of 30 billion parameters with a context window of 8000 tokens. This means it can understand pretty long word sequences to generate appropriate responses.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">It also uses the Attention with Linear Biases (ALiBi) technique, enabling the model to comprehend sequences longer than 8000 tokens. This feature makes MPT-30 highly valuable in the legal domain, where experts may use it for analyzing long contracts with complex legal diction.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">In addition, the MPT-30-Instruct is a purpose-built variation of MPT-30B, which effectively understands user instructions as input prompts. The variant is applicable where users want the model to precisely follow a set of instructions.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">In contrast, the MPT-30 chat is a conversational generative AI application that generates relevant human-like responses. This version also <\/span><a href=\"https:\/\/www.mosaicml.com\/blog\/mpt-30b#:~:text=Mosaic%20Pretrained%20Transformer%20(MPT)%20models,stability%2C%20and%20longer%20context%20lengths.\"><span style=\"font-weight: 400;\">performs well when generating code<\/span><\/a><span style=\"font-weight: 400;\"> in several programming languages. It also reportedly performs better than other code generators, such as StarCoder-GPTeacher on HumanEval. However, the MPT-30 chat is not yet available for commercial use.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">The defining development of the MPT-30 model is that it\u2019s the first-ever LLM that partially used NVIDIA\u2019s H100 GPUs for training, increasing throughput by <\/span><a href=\"https:\/\/www.mosaicml.com\/blog\/mpt-30b#:~:text=Mosaic%20Pretrained%20Transformer%20(MPT)%20models,stability%2C%20and%20longer%20context%20lengths.\"><span style=\"font-weight: 400;\">2.44 times per GPU<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">4. Dall-E Mini<\/span><\/h2>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7034\"><img loading=\"lazy\" decoding=\"async\" width=\"2238\" height=\"770\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.07.10-PM.png\" alt=\"Craiyon, makers of Dall-E, logo\" class=\"wp-image-7034\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.07.10-PM.png 2238w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.07.10-PM-300x103.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.07.10-PM-1024x352.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.07.10-PM-768x264.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.07.10-PM-1536x528.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.07.10-PM-2048x705.png 2048w\" sizes=\"auto, (max-width: 2238px) 100vw, 2238px\" \/><figcaption class=\"wp-element-caption\">Image Source: <a href=\"https:\/\/www.craiyon.com\/\">Craiyon<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">With the increasing popularity of AI text generators, several text-to-image models are also emerging, with advanced architectures producing realistic visuals.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Dall-E by Open AI is a text-to-image model that is an offshoot of the GPT-3 model with 12 billion parameters. The company released the original version on January 5, 2021, followed by Dall-E 2 on September 28, 2022, which claims to have better speed and image resolution.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">While Dall-E 2 is not widely available, Dall-E mini, also known as Craiyon, is open-sourced and generates simple images through textual prompts read through a bidirectional encoder.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">The generative AI model features a transformer neural network that uses the attention mechanism allowing the neural net to consider the most significant aspects of a given sequence.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">The attention method allows the model to produce more accurate results and make better connections between abstract elements to give unique images.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.craiyon.com\/\"><span style=\"font-weight: 400;\">Craiyon<\/span><\/a><span style=\"font-weight: 400;\"> and the more advanced <\/span><a href=\"https:\/\/openai.com\/dall-e-2\"><span style=\"font-weight: 400;\">Dall-E versions<\/span><\/a><span style=\"font-weight: 400;\"> are invaluable in the fashion industry, where companies display several outfits and products. With such image-generating technology, they can conveniently generate relevant photos without hiring expensive models and other professional staff.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">With an ability to create entirely new images of animals, humans, nature, and other arcane creatures, the Dall-E line of image generators can comprehend abstract textual descriptions and produce several variations by combining distinct concepts &#8211; extending human creativity to new levels with generative AI.<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">5. Stable Diffusion<\/span><\/h2>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full wp-image-7036\"><img loading=\"lazy\" decoding=\"async\" width=\"2720\" height=\"698\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.09.17-PM.png\" alt=\"Stability AI, the makers of Stable Diffusion and SDXL, logo\" class=\"wp-image-7036\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.09.17-PM.png 2720w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.09.17-PM-300x77.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.09.17-PM-1024x263.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.09.17-PM-768x197.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.09.17-PM-1536x394.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.09.17-PM-2048x526.png 2048w\" sizes=\"auto, (max-width: 2720px) 100vw, 2720px\" \/><figcaption class=\"wp-element-caption\">Image Source: <a href=\"https:\/\/stability.ai\/blog\/stable-diffusion-public-release\">Stability.ai<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Boasting much faster speed and more realistic images, Stable Diffusion is a more sophisticated generative AI model that uses textual prompts to produce high-quality visual art.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">As the name suggests, Stable Diffusion uses the diffusion model to create images. A diffusion model has two elements: <\/span><a href=\"https:\/\/stable-diffusion-art.com\/how-stable-diffusion-work\/\"><span style=\"font-weight: 400;\">forward diffusion and reverse diffusion<\/span><\/a><span style=\"font-weight: 400;\">. In forward diffusion, the model adds random noise to a photo. While in reverse diffusion, it subtracts the noise to get to the original image.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">The architecture features a noise predictor which takes a text prompt and random noise in latent space. Latent space is a low-dimensional space with compressed representations of an image.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Next, the model subtracts the predicted noise from the latent image and repeats this step several times. Finally, a variational encoder decodes the latent image into an actual photo.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">While DALL-E and DALL-E 2 also feature diffusion models, they are slower than Stable Diffusion. Also, <\/span><a href=\"https:\/\/stablediffusionweb.com\/#demo\"><span style=\"font-weight: 400;\">Stable Diffusion<\/span><\/a><span style=\"font-weight: 400;\"> is open-source and allows users to tweak several options through the Stability DreamStudio app. This provides more control over how you want to generate an image.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">For example, you can increase the number of steps for subtracting noise, provide different seeds, and control the prompt strength.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Stable Diffusion is suitable for users that want images that relate more to the real world. It is perfect for generating photographs, portraits, 3D images, etc.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Try <a href=\"https:\/\/www.comet.com\/site\/blog\/sam-stable-diffusion-for-text-to-image-inpainting\/\">this full code tutorial using Stable Diffusion<\/a> generative AI from Comet.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">6. AudioCraft<\/span><\/h2>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-7029\"><img loading=\"lazy\" decoding=\"async\" width=\"2378\" height=\"732\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-11.57.57-AM.png\" alt=\"Meta AI (artificial intelligence) logo\" class=\"wp-image-7029\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-11.57.57-AM.png 2378w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-11.57.57-AM-300x92.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-11.57.57-AM-1024x315.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-11.57.57-AM-768x236.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-11.57.57-AM-1536x473.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-11.57.57-AM-2048x630.png 2048w\" sizes=\"auto, (max-width: 2378px) 100vw, 2378px\" \/><figcaption class=\"wp-element-caption\">Image Source: <a href=\"https:\/\/ai.meta.com\/blog\/audiocraft-musicgen-audiogen-encodec-generative-ai-audio\/\">Meta<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">Although still in the early stages, generative AI\u2019s capabilities are extending into the audio domain, with technologies including Open AI\u2019s Jukebox and Harmonai models for music generation. But the most recent advancement is AudioCraft by Meta: an open-source text-to-music generation model.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/ai.meta.com\/blog\/audiocraft-musicgen-audiogen-encodec-generative-ai-audio\/\"><span style=\"font-weight: 400;\">AudioCraft<\/span><\/a><span style=\"font-weight: 400;\"> can effectively take textual prompts, such as \u201crock music with electronic sounds,\u201d and generate high-fidelity soundtracks without background noise. This is an impressive leap in generative AI, as previous models required audio inputs to create short clips that are often low quality.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">The AI technology uses three proprietary models: MusicGen, AudioGen, and EnCodec. MusicGen is an autoregressive transformer model that creates music clips using textual prompts. AudioGen, in contrast, generates environmental sounds (such as a dog barking, a child crying, whistling, etc.) through textual prompts.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">But the real game-changer is Meta\u2019s audio compression codec: Encodec. Encodec is a neural network that allows the model to learn discrete audio tokens (similar to word tokens in large language models) and create a vocabulary for music.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">The audio tokens then feed into autoregressive language models to generate new tokens. Finally, the Encodec model decodes the tokens and maps them onto the audio space to produce realistic musical clips.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">With Encodec, generative AI can finally analyze music containing long audio sequences with different frequencies. To give some perspective, songs spanning a couple of minutes contain millions of time steps compared to textual tokens that consist of mere thousands of time steps used for training LLMs.&nbsp;<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">AudioCraft can help artists and other creative professionals conveniently generate unique soundtracks and add them to videos, podcasts, and other forms of media. This can be done simply while experimenting with different melodies to speed up the production process.<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Conclusion<\/span><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">With generative AI entering the scene, the technology landscape is changing rapidly, allowing businesses to find more cost-effective ways to run their operations. Of course, what it means is organizations must adopt AI quickly to remain ahead of the competition.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">And the <\/span><a href=\"https:\/\/www.comet.com\/site\/\"><span style=\"font-weight: 400;\">Comet <\/span><\/a><span style=\"font-weight: 400;\">Machine Learning (ML) platform will help you get up to speed by letting you quickly train, test, and manage ML models in production. The tool allows businesses to easily leverage ML\u2019s power and boost productivity through its experiment-tracking features, interactive visualizations, and monitoring capabilities.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span style=\"font-weight: 400;\">So, <\/span><a href=\"\/signup\"><span style=\"font-weight: 400;\">create a free account now <\/span><\/a><span style=\"font-weight: 400;\">to benefit from Comet\u2019s full feature stack.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction With advanced generative AI models like Generative Pre-trained Transformer 3 (GPT-3), which provides human-like responses to user queries, AI is progressing toward generative tools to create realistic content, including text, videos, images, and audio. And with open-source becoming the norm, most AI models are available for public use for research and experimentation. As such, [&hellip;]<\/p>\n","protected":false},"author":54,"featured_media":7041,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[6],"tags":[],"coauthors":[156],"class_list":["post-7028","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>An Exhaustive List of Open-source Generative AI Models - Comet<\/title>\n<meta name=\"description\" content=\"Generative AI models can provide human-like responses to user queries and create realistic content including text, videos, images, and audio.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"An Exhaustive List of Open-source Generative AI Models\" \/>\n<meta property=\"og:description\" content=\"Generative AI models can provide human-like responses to user queries and create realistic content including text, videos, images, and audio.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-08-05T16:40:31+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:14:58+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.16.08-PM.png\" \/>\n\t<meta property=\"og:image:width\" content=\"300\" \/>\n\t<meta property=\"og:image:height\" content=\"304\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Haziqa Sajid\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Haziqa Sajid\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"An Exhaustive List of Open-source Generative AI Models - Comet","description":"Generative AI models can provide human-like responses to user queries and create realistic content including text, videos, images, and audio.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/","og_locale":"en_US","og_type":"article","og_title":"An Exhaustive List of Open-source Generative AI Models","og_description":"Generative AI models can provide human-like responses to user queries and create realistic content including text, videos, images, and audio.","og_url":"https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-08-05T16:40:31+00:00","article_modified_time":"2025-04-24T17:14:58+00:00","og_image":[{"width":300,"height":304,"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.16.08-PM.png","type":"image\/png"}],"author":"Haziqa Sajid","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Haziqa Sajid","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/"},"author":{"name":"Haziqa Sajid","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/b8e568abee61cd8fd0c5d73b672779da"},"headline":"An Exhaustive List of Open-source Generative AI Models","datePublished":"2023-08-05T16:40:31+00:00","dateModified":"2025-04-24T17:14:58+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/"},"wordCount":1657,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.16.08-PM.png","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/","url":"https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/","name":"An Exhaustive List of Open-source Generative AI Models - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.16.08-PM.png","datePublished":"2023-08-05T16:40:31+00:00","dateModified":"2025-04-24T17:14:58+00:00","description":"Generative AI models can provide human-like responses to user queries and create realistic content including text, videos, images, and audio.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.16.08-PM.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.16.08-PM.png","width":300,"height":304,"caption":"stylized graphic of a brain with a purple and blue gradient background"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/generative-ai-models-in-2023\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"An Exhaustive List of Open-source Generative AI Models"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/b8e568abee61cd8fd0c5d73b672779da","name":"Haziqa Sajid","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/817879efa0771c090195dd1888fca759","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/07\/cropped-1585931859188-96x96.jpg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/07\/cropped-1585931859188-96x96.jpg","caption":"Haziqa Sajid"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/haziqa5122gmail-com\/"}]}},"jetpack_featured_media_url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2023\/08\/Screen-Shot-2023-08-05-at-12.16.08-PM.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7028","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/54"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=7028"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7028\/revisions"}],"predecessor-version":[{"id":15590,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/7028\/revisions\/15590"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/7041"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=7028"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=7028"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=7028"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=7028"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}