Chord

by Newsweek

Jump to research

Composed by

Anonymous Whale

Views

523

Version history

May 28, 2023

what are the best open source large language models that are uncensored. how do i run them with a test prompt ?

During my research, I explored various Reddit threads and discussions related to open-source large language models that can be run with a test prompt. I found a range of opinions and suggestions on this topic. Some of the most frequently mentioned models include BLOOMChat, Hugging Face Transformers, GPT-2, GPT-Neo, GPT-NeoX, Bloom, LLaMA, GPT-J, ChatGLM, and Vicuna. The consensus among users varies, but several models stand out as viable options for different purposes. The sources provided insights into the models' capabilities, hardware requirements, and comparisons between their performance and that of commercial models like GPT-3.

Have an opinion? Send us proposed edits/additions and we may incorporate them into this article with credit.

Words

462

Time

6m 53s

Contributors

640

Words read

79.0k

BLOOMChat

BLOOMChat is an open-source, 176-billion parameter multilingual chat large language model built on top of the BLOOM model developed by the BigScience organization <(1.1)>. It can generate text in 46 natural languages and 13 programming languages. BLOOMChat has been evaluated in human tests, where it was preferred over GPT-4 responses 45.25% of the time <(1.2)>. However, running BLOOMChat may require significant hardware resources, such as a computer with at least 16GB of RAM.

"The BLOOM model has the ability to generate text in 46 natural languages and 13 programming languages."

"It was just a normal laptop with 16GB ram."

Hugging Face Transformers

Hugging Face Transformers is an open-source library that provides pre-trained language models, including smaller versions of GPT-2 and GPT-3. To run these models, users will need a good understanding of programming languages like Python and deep learning frameworks like TensorFlow or PyTorch.

"Hugging Face is a company that provides an open-source library called "Transformers," which offers various pre-trained language models, including smaller versions of GPT-2 and GPT-3."

GPT-2, GPT-Neo, and GPT-NeoX

OpenAI's GPT-2 is a publicly available pre-trained model, although not as advanced as GPT-3 or GPT-4. EleutherAI also offers open-source language models based on the GPT architecture, including GPT-Neo and GPT-NeoX. These models can be accessed through their respective GitHub repositories.

"OpenAI released GPT-2 in 2019 and later made the code and pre-trained models available to the public."

"EleutherAI is an independent research organization that aims to promote open research in artificial intelligence."

Bloom

Bloom, developed by a group of over 1,000 AI researchers, is an open-source multilingual language model trained on 176 billion parameters and considered the best alternative to GPT-3. It requires significant computational resources and is available on Hugging Face Transformers.

"Bloom which is an open-source multilingual language model that was developed by a group of over 1,000 AI researchers."

ChatGLM

ChatGLM is an open-source, self-hosted dialogue language model created by Tsinghua University that can run with as little as 6GB of GPU memory <(5.1)>. It is an alternative to ChatGPT and can be accessed through an online demo for non-commercial use.

Vicuna

Vicuna is an open-source language model based on LLaMA 13B that has been fine-tuned on user-shared conversations collected from ShareGPT.com with public APIs. It achieves more than 90% quality of OpenAI ChatGPT and Google Bard while outperforming other models like LLaMA and Stanford Alpaca in more than 90% of cases <(7.1)>. The model is available for non-commercial use through an online demo at chat.lmsys.org <(7.1)>. Overall, there are several open-source large language models that can be run with a test prompt, but the choice depends on factors such as hardware requirements, multilingual capabilities, and the model's performance in comparison to commercial alternatives like GPT-3.

"We introduce Vicuna-13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT… The training and serving code, along with an online demo, are publicly available for non-commercial use."

Jump to top

Research

""Meet BLOOMChat: An Open-Source 176-Billion-Parameter Multilingual Chat Large Language Model (LLM) Built on Top of the BLOOM Model""

Here are the relevant bullet points from the webpage to answer your query:

The BLOOM model is an open-source, 176-billion parameter multilingual chat large language model developed by the BigScience organization, an international collaboration of over 1000 researchers.
BLOOM can generate text in 46 natural languages and 13 programming languages, and is the first language model ever created with over 100 billion parameters for several languages, including Spanish, French, and Arabic.
BLOOMChat is an extension of BLOOM capabilities in the chat domain, created by fine-tuning BLOOM on open conversation and alignment datasets from projects like OpenChatKit, Dolly 2.0, and OASST1.
In human evaluations conducted across six languages, BLOOMChat responses were preferred over GPT-4 responses 45.25% of the time. Compared to four other open-source chat-aligned models in the same six languages, BLOOMChat’s responses ranked as the best 65.92% of the time.
There are discussions in the article comments about the hardware requirements of running the model, including RAM and SSDs. In particular, the comments provide insights into how much RAM is needed, and what sort of computer hardware would be required to run the model efficiently.
A variety of open-source chat models are mentioned in the comments, including llama.cpp and kobold.cpp.
There is some discussion in the comments section about the difficulty of performing math calculations with the model, including limitations on how accurately it can identify prime numbers.
Overall, the article provides a good summary of the BLOOM model and the BLOOMChat extension, its capabilities, and evaluations.
The comments section provides additional insights into the hardware and software requirements of running the model, as well as some cross-disciplinary and cross-linguistic applications of the model.

"Top Large Language Models (LLMs): GPT-4, LLaMA, FLAN UL2 ... - Vectara"

Not used in article

"[R] Reasoning with Language Model is Planning with World Model - Shibo Hao et al UC San Diego - RAP on LLAMA-33B surpasses CoT on GPT-4 with 33% relative improvement in a plan generation setting!"

Not used in article

"[D] Totally Open Alternatives to ChatGPT"

This webpage lists totally open alternatives to ChatGPT, including bare projects (without data, weights, and chat system) and full projects (with data, weights, and chat system including TUI and GUI).
Some projects mentioned include OpenChatKit, an open-source chatbot creation base; KoboldAI-Client, a browser-based front-end for AI-assisted writing; and LAION-AI’s OpenAssistant, a chat-based assistant that understands tasks and can interact with third-party systems.
Alpaca is mentioned, which is not open source but has been recreated and weighted for 7B without LORA in a diff released by Point Network. Alpaca uses non-open-source LLAMA and non-open-source training data from ChatGPT.
BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) is also mentioned. It is not fine-tuned for chat (or HRLF human alignment) and is just a model, not a service like ChatGPT. However, it’s open-source, and anyone could fine-tune it for whatever purpose they wanted based on its different strengths and weaknesses vs ChatGPT.
LLaMA is included in text-generation-webui and has also been fine-tuned on consumer hardware by people, as it can run chat input in four and eight-bit audio form, making it a popular choice for running locally.
Tsinghua recently released ChatGLM, a 6B model that can run on consumer hardware and handle Chinese text well in terms of generating conversational text.
Other projects mentioned include lucidrains/PaLM-rlhf-pytorch, a PaLM architecture implementation using RLHF (Reinforcement Learning with Human Feedback) for generating chat; text-generation-webui, a gradio web UI for running Large Language Models like GPT-J 6B, OPT, GALACTICA, LLaMA, and Pygmalion; and BlinkDL/ChatRWKV, a chatbot trained on Pile that can be fine-tuned into an excellent chat bot. ChatRWKV is not included in the original post, but it’s mentioned in the comments section.
The comments section of the Reddit post includes a lot of helpful info on the models, weights, and data, as well as links to repositories and articles.
GPTQ is mentioned as a 4bit quantization method that can be used instead of LORA. However, one Reddit user notes that it’s not as easy as RT

Vicuna-13B is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations gathered from ShareGPT.com with public APIs.
Preliminary evaluation using GPT-4 as a judge shows Vicuna-13B achieves more than 90% quality of OpenAI ChatGPT and Google Bard while outperforming other models like LLaMA and Stanford Alpaca in more than 90% of cases.
The cost of training Vicuna-13B is around $300.
The online demo is publicly available for non-commercial use.
Vicuna is created by fine-tuning a LLaMA base model using approximately 70K user-shared conversations gathered from ShareGPT.com with public APIs, which are filtered for inappropriate or low-quality examples while HTML is converted back to markdown
To expanded the max context length from 512 in alpaca to 2048 Vicuna uses gradient checkpointing and flash attention which increases GPU memory requirements.
The dataset for training is 40x larger and 4x the sequence length which is why they use skyPilot managed with auto-recovery for preemptions and auto zone switch to reduce the cost.
Vicuna has limitations: not good at tasks involving reasoning or mathematics, accurately identifying itself or ensuring the factual accuracy of its outputs.
Vicuna has not been sufficiently optimized to guarantee safety or mitigate potential toxicity or bias.
To tackle the safety concerns, Vicuna uses OpenAI moderation API to filter out inappropriate user inputs in the online demo.
Link to the online demo: https://chat.lmsys.org/
All credits go to the creators of this model and usage of this model falls under a non-commercial license.
Reddit users commented on whether the model is open-source or source-available and addressed potential legal or licensing-related issues.

"[D] GPT-J for text generation: hardware requirements"

The webpage discusses hardware requirements for GPT-J, an advanced language model for text generation
The author had added GPT-J to NLPCloud.io for text generation and found that the model needs around 40GB of memory during loading and 20GB during runtime on CPU
On GPU, the model needs around 40GB of memory to load and then around 3GB during runtime + 24GB of GPU memory and the latency is around 1.5 seconds
The author notes that the high amount of RAM needed for startup and the high amount of GPU memory required during runtime are the two main challenges
The author mentions that it is not practical as most affordable NVIDIA GPUs dedicated to inference, like Tesla T4, only have 16GB of memory
The author notes that during their tests, the latency was pretty much the same as GPT-Neo 2.7B on the same hardware, but accuracy seems better
In the comments section, one user asks whether the author is using the HF port or the original Jax model of GPT-J
Another user suggests a hack to work around the 2x RAM usage issue when loading models, which is to serialize the model instead of the weights. This gives 1x memory usage when loading but may reduce portability
Another user asks whether it’s possible to use multiple GPUs in parallel to solve the memory problem
Another user recommends using GPUs with more VRAM, such as NVIDIA V100 32GB, or converting PyTorch tensors to 16-bit floating points tensors to decrease the GPU memory required by the model
One user recommends using DeepSpeed’s recently added inference API as a solution
Another user suggests designing the model to process and move 1/4 or less of that model at a time in order to alleviate the high VRAM requirement
They suggest potentially sectioning/modifying the model into a sparse activation map to load only the non-zero pathways into the VRAM for processing
At the end, the webpage does not provide a list of uncensored large language models or instructions for running them with a test prompt

The webpage is a discussion thread on the r/OpenAI subreddit titled “Any real competitor to GPT-3 which is open source and downloadable?” posted 2 months ago with 20 upvotes.
A redditor suggests that Hugging Face Transformers, OpenAI’s GPT-2, EleutherAI’s GPT-Neo and GPT-NeoX, and Bloom are popular open-source language models.
Hugging Face is an open-source library that provides pre-trained language models, including smaller versions of GPT-2 and GPT-3. Their website is huggingface.co/transformers/.
OpenAI’s GPT-2 is available as a pre-trained model, although it is not as advanced as GPT-3 or GPT-4. The code and pre-trained models are available on GitHub.
EleutherAI also provides open-source language models based on the GPT architecture, including GPT-Neo and GPT-NeoX, and they aim to promote open research in artificial intelligence. Their GitHub repositories are GPT-Neo, available on github.com/EleutherAI/gpt-neo, and GPT-NeoX, available on github.com/EleutherAI/gpt-neox.
Bloom is an open-source multilingual language model developed by a group of over 1,000 AI researchers. It is trained on 176 billion parameters and is considered the best alternative to GPT-3. The model is available on Hugging Face Transformers and requires significant computational resources.
A redditor suggests Llama 65B as a second-place model, which is smaller than Bloom but can still exceed chatGPT-3.5 with fine-tuning or RLHF.
The available language models usually require a good understanding of programming languages such as Python and deep learning frameworks like TensorFlow or PyTorch.
Running large models may require significant computational resources, so be mindful of hardware capabilities when working with them.
Redditors discuss accessing the Inference API for Bloom via usual HTTP requests or using the huggingface_hub library client wrapper programmatically.
The GitHub repository transformers-bloom-inference provides demos and packages to perform fast inference solutions for Bloom.
A tutorial for a local install of Bloom is available on towardsdatascience.com.
One redditor suggests that open-source alternatives comparable to GPT-3 may be available within five years.

"ChatGLM, an open-source, self-hosted dialogue language model and alternative to ChatGPT created by Tsinghua University, can be run with as little as 6GB of GPU memory."

ChatGLM is an open-source, self-hosted dialogue language model that can be run with as little as 6GB of GPU memory.
ChatGLM was created by a team from Tsinghua University.
ChatGLM is an alternative to ChatGPT.
The webpage includes comments from Reddit users who provide information on other open-source AI models that may be of interest, including Vicuna, huggingface, Facebook LM, alpaca, and llama.
Vicuna is an open-source chat AI that can be tried out on the web.
It is possible that Vicuna is trained with ChatGPT, which may have terms of service considerations.
Huggingface provides an array of pre-trained models, including GPT variants.
The Reddit thread linked on the page provides additional chatbot-like LM models that are similar to ChatGPT, including text-generation-webui and alpaca.
Stable Diffusion is another open-source AI model that can be used for AI artwork.
The Github repository for Stable Diffusion includes tips on how to run the model.
The Facebook LM is semi-publicly available and can be set up using a guide provided in the comments section.
There is a discussion among the Reddit users on how many open-source models are available, and how big those models can get with enough VRAM.
Some Reddit users have used GPT variations with PyTorch pretrained models from Huggingface.
There is a 4 bit-quantized version of the 7b model that can run on GPUs with 6GB VRAM.
The Alpaca LORA is a 7b model.
It can be run entirely on a CPU, and reportedly gives similar results to GPT3, but in a less memory-intensive way.
There is a video of someone running the Alpaca model entirely on a Pixel 5.
The future may include locally hosted ChatGPT models with no restrictions.
There is a Docker image available for ChatGLM that includes a built-in playground UI and exposes a streaming API compatible with the OpenAI API.
The webpage includes code snippets and examples for running ChatGLM on both GPUs and CPUs.
Running on a CPU may be slow.
There is a link to a web UI created by an individual that can be used for ChatGLM.
For CPU-only, use “`model = AutoModel.from_pretrained(“THUDM/chatglm-6b”, trust_remote_code=True).

"A Strong, Open Source, Alternative to GPT-3"

Bloom is an open source alternative to GPT-3 with 176 billion parameters and trained on 59 languages including programming languages.
Bloom is an experiment and is generally considered unimpressive for its size, but it has potential for language generation in multiple languages.
Bloom model’s performance is deemed worse than GPT-3, but it still does language model stuff and has knowledge about the world. Bloom is also known for being multilingual and for generating text in multiple languages.
Bloom can be downloaded and run locally with enough storage space and 16GB of RAM, and the patience to handle slow text generation.
There is an attempt to fine tune Bloom into becoming a chatbot by using prompt tuning and training the model with chat data.
PETAL is an attempt to run Bloom virtually by sharing GPUs through Google Colab.
GPT-J-6B is a good alternative to Bloom for language model research, as it is fast and good for story writing.
NovelAI’s Krake is based on NeoX, a high-performance language model, and it is considered way better than Bloom.
Bloom’s ability to generate code is notable.
The webpage provides links to download Bloom and to train a chatbot using prompt tuning with the Bloom model.
Trained Bloom models can generate language in multiple languages and not just a single finite language.
Chinchilla is an upcoming language model that could make a splash in the future.
The webpage recommends not doing work while under chemical influence and to drop the stoned talk from posts.

"How to Run a ChatGPT Alternative on Your Local PC"

Not used in article

"GitHub - openai/openai-cookbook: Examples and guides for using the ..."

Not used in article

💭 Looking into

Instructions on how to download and run an open source, uncensored language model with a test prompt

💭 Looking into

List of top 3 open source, uncensored language models

Jump to research

Version history

May 28, 2023

what are the best open source large language models that are uncensored. how do i run them with a test prompt ?

BLOOMChat

Hugging Face Transformers

GPT-2, GPT-Neo, and GPT-NeoX

Bloom

ChatGLM

Vicuna

Jump to top

Research

""Meet BLOOMChat: An Open-Source 176-Billion-Parameter Multilingual Chat Large Language Model (LLM) Built on Top of the BLOOM Model""

"Top Large Language Models (LLMs): GPT-4, LLaMA, FLAN UL2 ... - Vectara"

"[R] Reasoning with Language Model is Planning with World Model - Shibo Hao et al UC San Diego - RAP on LLAMA-33B surpasses CoT on GPT-4 with 33% relative improvement in a plan generation setting!"

"[D] Totally Open Alternatives to ChatGPT"

"This AI Can Generate Convincing <b>Text</b>—and Anyone Can Use It"

"EleutherAI/gpt-j-6b · Hugging Face"

"Comparing models: GPT4xAlpaca, Vicuna, and OASST"

"Large language models and the rise of the AI code generators"

"The emerging types of language models and why they matter - TechCrunch"

"[R] Hello Dolly: Democratizing the magic of ChatGPT with open models"

"New to NLP. Looking for library recommendations."

"What is Text Generation? - Hugging Face"

"Pretrained Language Models for Text Generation: A Survey"

"[N] Dolly 2.0, an open source, instruction-following LLM for research and commercial use"

"Top 12 natural-language-generation Open-Source Projects - LibHunt"

"Pororo: A Deep Learning based Multilingual Natural Language Processing Library"

"10 Leading Language Models For NLP In 2022 - TOPBOTS"

"Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for ..."

"What is currently the best model for text-generation besides gtp-3?"

"[P] Introducing Vicuna: An open-source language model based on LLaMA 13B"

"[D] GPT-J for text generation: hardware requirements"

"12 open source tools for natural language processing"

"Pre-Trained Language Models and Their Applications"

"Any real competitor to GPT-3 which is open source and downloadable?"

"5 AI Tools That Can Generate Code To Help Programmers - Forbes"

"bigscience/bloom · Hugging Face"

"Which models are best for template based text generation"

"ChatGLM, an open-source, self-hosted dialogue language model and alternative to ChatGPT created by Tsinghua University, can be run with as little as 6GB of GPU memory."

"A Strong, Open Source, Alternative to GPT-3"

"How to Run a ChatGPT Alternative on Your Local PC"

"GitHub - openai/openai-cookbook: Examples and guides for using the ..."