Microsoft unveils Phi-3 family of compact language models

[ad_1] Microsoft has announced the Phi-3 family of open small language models (SLMs), touting them as the most capable and cost-effective of their size available. The innovative training approach developed by Microsoft researchers has allowed the Phi-3 models to outperform larger models on language, coding, and math benchmarks.

“What we’re going to start to see is not a shift from large to small, but a shift from a singular category of models to a portfolio of models where customers get the ability to make a decision on what is the best model for their scenario,” said Sonali Yadav, Principal Product Manager for Generative AI at Microsoft.

The first Phi-3 model, Phi-3-mini at 3.8 billion parameters, is now publicly available in Azure AI Model Catalog, Hugging Face, Ollama, and as an NVIDIA NIM microservice. Despite its compact size, Phi-3-mini outperforms models twice its size. Additional Phi-3 models like Phi-3-small (7B parameters) and Phi-3-medium (14B parameters) will follow soon.

“Some customers may only need small models, some will need big models and many are going to want to combine both in a variety of ways,” said Luis Vargas, Microsoft VP of AI.

The key advantage of SLMs is their smaller size enabling on-device deployment for low-latency AI experiences without network connectivity. Potential use cases include smart sensors, cameras, farming equipment, and more. Privacy is another benefit by keeping data on the device.

<img loading="lazy" decoding="async" width="1024" height="576" src="https://www.artificialintelligence-news.com/wp-content/uploads/sites/9/2024/04/quality-vs-size-slm-microsoft-phi-3-1024x576.jpg" alt="" class="wp-image-14736" srcset="https://www.artificialintelligence-news.com/wp-content/uploads/sites/9/2024/04/quality-vs-size-slm-microsoft-phi-3-1024x576.jpg 1024w, https://www.artificialintelligence-news.com/wp-content/uploads/sites/9/2024/04/quality-vs-size-slm-microsoft-phi-3-300x169.jpg 300w, https://www.artificialintelligence-news.com/wp-content/uploads/sites/9/2024/04/quality-vs-size-slm-microsoft-phi-3-768x432.jpg 768w, https://www.artificialintelligence-news.com/wp-content/uploads/sites/9/2024/04/quality-vs-size-slm-microsoft-phi-3-1536x864.jpg 1536w, https://www.artificialintelligence-news.com/wp-content/uploads/sites/9/2024/04/quality-vs-size-slm-microsoft-phi-3-356x200.jpg 356w, https://www.artificialintelligence-news.com/wp-content/uploads/sites/9/2024/04/quality-vs-size-slm-microsoft-phi-3-380x214.jpg 380w, https://www.artificialintelligence-news.com/wp-content/uploads/sites/9/2024/04/quality-vs-size-slm-microsoft-phi-3-350x197.jpg 350w, https://www.artificialintelligence-news.com/wp-content/uploads/sites/9/2024/04/quality-vs-size-slm-microsoft-phi-3-100x56.jpg 100w, https://www.artificialintelligence-news.com/wp-content/uploads/sites/9/2024/04/quality-vs-size-slm-microsoft-phi-3-60x34.jpg 60w, https://www.artificialintelligence-news.com/wp-content/uploads/sites/9/2024/04/quality-vs-size-slm-microsoft-phi-3.jpg 2000w" sizes="(max-width: 1024px) 100vw, 1024px"/> (Credit: Microsoft)

Large language models (LLMs) excel at complex reasoning over vast datasets—strengths suited to applications like drug discovery by understanding interactions across scientific literature. However, SLMs offer a compelling alternative for simpler query answering, summarisation, content generation, and the like.

“Rather than chasing ever-larger models, Microsoft is developing tools with more carefully curated data and specialised training,” commented Victor Botev, CTO and Co-Founder of Iris.ai.

“This allows for improved performance and reasoning abilities without the massive computational costs of models with trillions of parameters. Fulfilling this promise would mean tearing down a huge adoption barrier for businesses looking for AI solutions.”

Breakthrough training technique

What enabled Microsoft’s SLM quality leap was an innovative data filtering and generation approach inspired by bedtime story books.

“Instead of training on just raw web data, why don’t you look for data which is of extremely high quality?” asked Sebastien Bubeck, Microsoft VP leading SLM research.

Ronen Eldan’s nightly reading routine with his daughter sparked the idea to generate a ‘TinyStories’ dataset of millions of simple narratives created by prompting a large model with combinations of words a 4-year-old would know. Remarkably, a 10M parameter model trained on TinyStories could generate fluent stories with perfect grammar.

Building on that early success, the team procured high-quality web data vetted for educational value to create the ‘CodeTextbook’ dataset. This was synthesised through rounds of prompting, generation, and filtering by both humans and large AI models.

“A lot of care goes into producing these synthetic data,” Bubeck said. “We don’t take everything that we produce.”

The high-quality training data proved transformative. “Because it’s reading from textbook-like material…you make the task of the language model to read and understand this material much easier,” Bubeck explained.

Mitigating AI safety risks

Despite the thoughtful data curation, Microsoft emphasises applying additional safety practices to the Phi-3 release mirroring its standard processes for all generative AI models.

“As with all generative AI model releases, Microsoft’s product and responsible AI teams used a multi-layered approach to manage and mitigate risks in developing Phi-3 models,” a blog post stated.

This included further training examples to reinforce expected behaviours, assessments to identify vulnerabilities through red-teaming, and offering Azure AI tools for customers to build trustworthy applications atop Phi-3.

(Photo by Tadas Sar)

See also: Microsoft to forge AI partnerships with South Korean tech leaders

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.