May 28, 2023
what are the best open source large language models that are uncensored. how do i run them with a test prompt ?
During my research, I explored various Reddit threads and discussions related to open-source large language models that can be run with a test prompt. I found a range of opinions and suggestions on this topic. Some of the most frequently mentioned models include BLOOMChat, Hugging Face Transformers, GPT-2, GPT-Neo, GPT-NeoX, Bloom, LLaMA, GPT-J, ChatGLM, and Vicuna. The consensus among users varies, but several models stand out as viable options for different purposes. The sources provided insights into the models' capabilities, hardware requirements, and comparisons between their performance and that of commercial models like GPT-3.
Have an opinion? Send us proposed edits/additions and we may incorporate them into this article with credit.
Words
462
Time
6m 53s
Contributors
640
Words read
79.0k
BLOOMChat
Hugging Face Transformers
GPT-2, GPT-Neo, and GPT-NeoX
Bloom
ChatGLM
Vicuna
Jump to top
Research
""Meet BLOOMChat: An Open-Source 176-Billion-Parameter Multilingual Chat Large Language Model (LLM) Built on Top of the BLOOM Model""
Here are the relevant bullet points from the webpage to answer your query:
- The BLOOM model is an open-source, 176-billion parameter multilingual chat large language model developed by the BigScience organization, an international collaboration of over 1000 researchers.
- BLOOM can generate text in 46 natural languages and 13 programming languages, and is the first language model ever created with over 100 billion parameters for several languages, including Spanish, French, and Arabic.
- BLOOMChat is an extension of BLOOM capabilities in the chat domain, created by fine-tuning BLOOM on open conversation and alignment datasets from projects like OpenChatKit, Dolly 2.0, and OASST1.
- In human evaluations conducted across six languages, BLOOMChat responses were preferred over GPT-4 responses 45.25% of the time. Compared to four other open-source chat-aligned models in the same six languages, BLOOMChat’s responses ranked as the best 65.92% of the time.
- There are discussions in the article comments about the hardware requirements of running the model, including RAM and SSDs. In particular, the comments provide insights into how much RAM is needed, and what sort of computer hardware would be required to run the model efficiently.
- A variety of open-source chat models are mentioned in the comments, including llama.cpp and kobold.cpp.
- There is some discussion in the comments section about the difficulty of performing math calculations with the model, including limitations on how accurately it can identify prime numbers.
- Overall, the article provides a good summary of the BLOOM model and the BLOOMChat extension, its capabilities, and evaluations.
- The comments section provides additional insights into the hardware and software requirements of running the model, as well as some cross-disciplinary and cross-linguistic applications of the model.
"Top Large Language Models (LLMs): GPT-4, LLaMA, FLAN UL2 ... - Vectara"
Not used in article
"[D] Totally Open Alternatives to ChatGPT"
- This webpage lists totally open alternatives to ChatGPT, including bare projects (without data, weights, and chat system) and full projects (with data, weights, and chat system including TUI and GUI).
- Some projects mentioned include OpenChatKit, an open-source chatbot creation base; KoboldAI-Client, a browser-based front-end for AI-assisted writing; and LAION-AI’s OpenAssistant, a chat-based assistant that understands tasks and can interact with third-party systems.
- Alpaca is mentioned, which is not open source but has been recreated and weighted for 7B without LORA in a diff released by Point Network. Alpaca uses non-open-source LLAMA and non-open-source training data from ChatGPT.
- BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) is also mentioned. It is not fine-tuned for chat (or HRLF human alignment) and is just a model, not a service like ChatGPT. However, it’s open-source, and anyone could fine-tune it for whatever purpose they wanted based on its different strengths and weaknesses vs ChatGPT.
- LLaMA is included in text-generation-webui and has also been fine-tuned on consumer hardware by people, as it can run chat input in four and eight-bit audio form, making it a popular choice for running locally.
- Tsinghua recently released ChatGLM, a 6B model that can run on consumer hardware and handle Chinese text well in terms of generating conversational text.
- Other projects mentioned include lucidrains/PaLM-rlhf-pytorch, a PaLM architecture implementation using RLHF (Reinforcement Learning with Human Feedback) for generating chat; text-generation-webui, a gradio web UI for running Large Language Models like GPT-J 6B, OPT, GALACTICA, LLaMA, and Pygmalion; and BlinkDL/ChatRWKV, a chatbot trained on Pile that can be fine-tuned into an excellent chat bot. ChatRWKV is not included in the original post, but it’s mentioned in the comments section.
- The comments section of the Reddit post includes a lot of helpful info on the models, weights, and data, as well as links to repositories and articles.
- GPTQ is mentioned as a 4bit quantization method that can be used instead of LORA. However, one Reddit user notes that it’s not as easy as RT
"This AI Can Generate Convincing <b>Text</b>—and Anyone Can Use It"
Not used in article
"EleutherAI/gpt-j-6b · Hugging Face"
Not used in article
"Comparing models: GPT4xAlpaca, Vicuna, and OASST"
Not used in article
"Large language models and the rise of the AI code generators"
Not used in article
"The emerging types of language models and why they matter - TechCrunch"
Not used in article
"[R] Hello Dolly: Democratizing the magic of ChatGPT with open models"
Not used in article
"New to NLP. Looking for library recommendations."
Not used in article
"What is Text Generation? - Hugging Face"
Not used in article
"Pretrained Language Models for Text Generation: A Survey"
Not used in article
"[N] Dolly 2.0, an open source, instruction-following LLM for research and commercial use"
Not used in article
"Top 12 natural-language-generation Open-Source Projects - LibHunt"
Not used in article
"Pororo: A Deep Learning based Multilingual Natural Language Processing Library"
Not used in article
"10 Leading Language Models For NLP In 2022 - TOPBOTS"
Not used in article
"Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for ..."
Not used in article
"What is currently the best model for text-generation besides gtp-3?"
Not used in article
"[P] Introducing Vicuna: An open-source language model based on LLaMA 13B"
Relevant: True Importance: 7 Notes:
- Vicuna-13B is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations gathered from ShareGPT.com with public APIs.
- Preliminary evaluation using GPT-4 as a judge shows Vicuna-13B achieves more than 90% quality of OpenAI ChatGPT and Google Bard while outperforming other models like LLaMA and Stanford Alpaca in more than 90% of cases.
- The cost of training Vicuna-13B is around $300.
- The online demo is publicly available for non-commercial use.
- Vicuna is created by fine-tuning a LLaMA base model using approximately 70K user-shared conversations gathered from ShareGPT.com with public APIs, which are filtered for inappropriate or low-quality examples while HTML is converted back to markdown
- To expanded the max context length from 512 in alpaca to 2048 Vicuna uses gradient checkpointing and flash attention which increases GPU memory requirements.
- The dataset for training is 40x larger and 4x the sequence length which is why they use skyPilot managed with auto-recovery for preemptions and auto zone switch to reduce the cost.
- Vicuna has limitations: not good at tasks involving reasoning or mathematics, accurately identifying itself or ensuring the factual accuracy of its outputs.
- Vicuna has not been sufficiently optimized to guarantee safety or mitigate potential toxicity or bias.
- To tackle the safety concerns, Vicuna uses OpenAI moderation API to filter out inappropriate user inputs in the online demo.
- Link to the online demo: https://chat.lmsys.org/
- All credits go to the creators of this model and usage of this model falls under a non-commercial license.
- Reddit users commented on whether the model is open-source or source-available and addressed potential legal or licensing-related issues.
"[D] GPT-J for text generation: hardware requirements"
- The webpage discusses hardware requirements for GPT-J, an advanced language model for text generation
- The author had added GPT-J to NLPCloud.io for text generation and found that the model needs around 40GB of memory during loading and 20GB during runtime on CPU
- On GPU, the model needs around 40GB of memory to load and then around 3GB during runtime + 24GB of GPU memory and the latency is around 1.5 seconds
- The author notes that the high amount of RAM needed for startup and the high amount of GPU memory required during runtime are the two main challenges
- The author mentions that it is not practical as most affordable NVIDIA GPUs dedicated to inference, like Tesla T4, only have 16GB of memory
- The author notes that during their tests, the latency was pretty much the same as GPT-Neo 2.7B on the same hardware, but accuracy seems better
- In the comments section, one user asks whether the author is using the HF port or the original Jax model of GPT-J
- Another user suggests a hack to work around the 2x RAM usage issue when loading models, which is to serialize the model instead of the weights. This gives 1x memory usage when loading but may reduce portability
- Another user asks whether it’s possible to use multiple GPUs in parallel to solve the memory problem
- Another user recommends using GPUs with more VRAM, such as NVIDIA V100 32GB, or converting PyTorch tensors to 16-bit floating points tensors to decrease the GPU memory required by the model
- One user recommends using DeepSpeed’s recently added inference API as a solution
- Another user suggests designing the model to process and move 1/4 or less of that model at a time in order to alleviate the high VRAM requirement
- They suggest potentially sectioning/modifying the model into a sparse activation map to load only the non-zero pathways into the VRAM for processing
- At the end, the webpage does not provide a list of uncensored large language models or instructions for running them with a test prompt
"12 open source tools for natural language processing"
Not used in article
"Pre-Trained Language Models and Their Applications"
Not used in article
"Any real competitor to GPT-3 which is open source and downloadable?"
- The webpage is a discussion thread on the r/OpenAI subreddit titled “Any real competitor to GPT-3 which is open source and downloadable?” posted 2 months ago with 20 upvotes.
- A redditor suggests that Hugging Face Transformers, OpenAI’s GPT-2, EleutherAI’s GPT-Neo and GPT-NeoX, and Bloom are popular open-source language models.
- Hugging Face is an open-source library that provides pre-trained language models, including smaller versions of GPT-2 and GPT-3. Their website is huggingface.co/transformers/.
- OpenAI’s GPT-2 is available as a pre-trained model, although it is not as advanced as GPT-3 or GPT-4. The code and pre-trained models are available on GitHub.
- EleutherAI also provides open-source language models based on the GPT architecture, including GPT-Neo and GPT-NeoX, and they aim to promote open research in artificial intelligence. Their GitHub repositories are GPT-Neo, available on github.com/EleutherAI/gpt-neo, and GPT-NeoX, available on github.com/EleutherAI/gpt-neox.
- Bloom is an open-source multilingual language model developed by a group of over 1,000 AI researchers. It is trained on 176 billion parameters and is considered the best alternative to GPT-3. The model is available on Hugging Face Transformers and requires significant computational resources.
- A redditor suggests Llama 65B as a second-place model, which is smaller than Bloom but can still exceed chatGPT-3.5 with fine-tuning or RLHF.
- The available language models usually require a good understanding of programming languages such as Python and deep learning frameworks like TensorFlow or PyTorch.
- Running large models may require significant computational resources, so be mindful of hardware capabilities when working with them.
- Redditors discuss accessing the Inference API for Bloom via usual HTTP requests or using the huggingface_hub library client wrapper programmatically.
- The GitHub repository transformers-bloom-inference provides demos and packages to perform fast inference solutions for Bloom.
- A tutorial for a local install of Bloom is available on towardsdatascience.com.
- One redditor suggests that open-source alternatives comparable to GPT-3 may be available within five years.
"5 AI Tools That Can Generate Code To Help Programmers - Forbes"
Not used in article
"bigscience/bloom · Hugging Face"
Not used in article
"Which models are best for template based text generation"
Not used in article
"ChatGLM, an open-source, self-hosted dialogue language model and alternative to ChatGPT created by Tsinghua University, can be run with as little as 6GB of GPU memory."
- ChatGLM is an open-source, self-hosted dialogue language model that can be run with as little as 6GB of GPU memory.
- ChatGLM was created by a team from Tsinghua University.
- ChatGLM is an alternative to ChatGPT.
- The webpage includes comments from Reddit users who provide information on other open-source AI models that may be of interest, including Vicuna, huggingface, Facebook LM, alpaca, and llama.
- Vicuna is an open-source chat AI that can be tried out on the web.
- It is possible that Vicuna is trained with ChatGPT, which may have terms of service considerations.
- Huggingface provides an array of pre-trained models, including GPT variants.
- The Reddit thread linked on the page provides additional chatbot-like LM models that are similar to ChatGPT, including text-generation-webui and alpaca.
- Stable Diffusion is another open-source AI model that can be used for AI artwork.
- The Github repository for Stable Diffusion includes tips on how to run the model.
- The Facebook LM is semi-publicly available and can be set up using a guide provided in the comments section.
- There is a discussion among the Reddit users on how many open-source models are available, and how big those models can get with enough VRAM.
- Some Reddit users have used GPT variations with PyTorch pretrained models from Huggingface.
- There is a 4 bit-quantized version of the 7b model that can run on GPUs with 6GB VRAM.
- The Alpaca LORA is a 7b model.
- It can be run entirely on a CPU, and reportedly gives similar results to GPT3, but in a less memory-intensive way.
- There is a video of someone running the Alpaca model entirely on a Pixel 5.
- The future may include locally hosted ChatGPT models with no restrictions.
- There is a Docker image available for ChatGLM that includes a built-in playground UI and exposes a streaming API compatible with the OpenAI API.
- The webpage includes code snippets and examples for running ChatGLM on both GPUs and CPUs.
- Running on a CPU may be slow.
- There is a link to a web UI created by an individual that can be used for ChatGLM.
- For CPU-only, use “`model = AutoModel.from_pretrained(“THUDM/chatglm-6b”, trust_remote_code=True).
"A Strong, Open Source, Alternative to GPT-3"
- Bloom is an open source alternative to GPT-3 with 176 billion parameters and trained on 59 languages including programming languages.
- Bloom is an experiment and is generally considered unimpressive for its size, but it has potential for language generation in multiple languages.
- Bloom model’s performance is deemed worse than GPT-3, but it still does language model stuff and has knowledge about the world. Bloom is also known for being multilingual and for generating text in multiple languages.
- Bloom can be downloaded and run locally with enough storage space and 16GB of RAM, and the patience to handle slow text generation.
- There is an attempt to fine tune Bloom into becoming a chatbot by using prompt tuning and training the model with chat data.
- PETAL is an attempt to run Bloom virtually by sharing GPUs through Google Colab.
- GPT-J-6B is a good alternative to Bloom for language model research, as it is fast and good for story writing.
- NovelAI’s Krake is based on NeoX, a high-performance language model, and it is considered way better than Bloom.
- Bloom’s ability to generate code is notable.
- The webpage provides links to download Bloom and to train a chatbot using prompt tuning with the Bloom model.
- Trained Bloom models can generate language in multiple languages and not just a single finite language.
- Chinchilla is an upcoming language model that could make a splash in the future.
- The webpage recommends not doing work while under chemical influence and to drop the stoned talk from posts.
"How to Run a ChatGPT Alternative on Your Local PC"
Not used in article
"GitHub - openai/openai-cookbook: Examples and guides for using the ..."
Not used in article
💭 Looking into
Instructions on how to download and run an open source, uncensored language model with a test prompt
💭 Looking into
List of top 3 open source, uncensored language models