Open Source LLMs#
Why Open-Source LLMs#
Open-source large language models (LLMs) offer key advantages for academic and data-driven research:
Data Security & Privacy: Open-source models can be run locally or on secure institutional infrastructure, making them well-suited for work involving proprietary, sensitive, or confidential datasets. No external API calls mean full control over data flow and storage.
Reproducibility: Open models allow researchers to fully document and share their code, data, and model version — enabling others to replicate and verify results. This supports transparent, trustworthy research practices.
Long-Term Availability: Because the model weights and code are publicly accessible, research is not dependent on the long-term availability of a vendor’s API or proprietary version.
Customization & Control: Researchers can fine-tune or adapt open models to specific tasks, domains, or datasets (e.g., finance, marketing, consumer behavior), often with full control over input/output and model behavior.
Open Source LLMs the Right Way Workshop#
This workshop was held in April 2024. We offered guidance on how to choose the right model and how you can deploy an open-source LLM using KLC. It was documented in this Jupyter Book.
Sample scripts were saved at this GitHub repo.
Some commonly used LLM models have been downloaded to KLC at /kellogg/data/llm_models_opensource.