Building a Sommelier in a Weekend: How I Fine-Tuned a 7B Model to Know Wine

I’ve always found wine intimidating. Restaurant wine lists read like exam papers in a subject I never studied. Sommeliers speak a language of tannins, terroir, and malolactic fermentation. I’d default to “the second cheapest red” and hope for the best.

The knowledge exists. It’s scattered across wine blogs, Reddit threads, critic reviews, and textbooks that cost more than a decent bottle. What I wanted was simple: a way to ask “what wine goes with this?” and get an answer that didn’t require a sommelier certification to understand.

So I built one. Sommo is a 7B parameter language model fine-tuned to be a wine expert. It knows grape varieties, food pairings, regions, and can explain why a Riesling works with Thai food. This post documents how I trained v1, what worked, what didn’t, and where to find the code.

What Sommo Does

The model handles four core tasks:

Food pairing — Ask what wine pairs with grilled salmon, and it’ll explain why a Burgundy or Oregon Pinot Noir works (lighter red, won’t overpower the fish, complements the char).
Wine knowledge — Explain the difference between Champagne and Prosecco. What makes Barolo special. How climate affects Riesling.
Recommendations — “I like fruity reds under £20” gets you actual suggestions with reasoning, not a generic “try Merlot”.
Tasting notes — Describe what you’ll taste in a wine using vocabulary that’s accessible but accurate.

The goal wasn’t to replace sommeliers. It was to make wine approachable for people who want to enjoy it without pretending they understand what “grippy tannins with notes of pencil shavings” means.

Why Build This?

Foundation models are impressive generalists. Ask GPT-4 about wine, and you’ll get reasonable answers. But there’s a ceiling. General models spread their capacity across everything. A domain-specific model concentrates its knowledge.

I wanted to test a hypothesis: could I take an open-source base model, fine-tune it on ~100K wine conversations, and produce something meaningfully better at wine tasks than the base model alone?

The answer: yes, but with caveats I’ll cover later.

Training Approach

Base Model Selection

I chose Qwen 2.5-7B-Instruct for the base. The reasoning:

7B parameters hits a sweet spot: large enough for nuanced responses, small enough to train on a single GPU
Qwen 2.5 shows strong instruction-following out of the box
The instruct variant already handles conversational Q&A, so I’m adapting rather than teaching from scratch

LoRA Configuration

Full fine-tuning a 7B model requires significant compute. LoRA (Low-Rank Adaptation) lets you train a small set of adapter weights instead. You freeze the base model and only update low-rank matrices injected into attention and MLP layers.

Parameter	Value
Rank (r)	64
Alpha	64
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable parameters	161M (2.08% of total)

The r=64 configuration provides enough capacity for domain adaptation without overfitting. Targeting all attention projections plus the MLP ensures the model can learn both “what to attend to” and “how to transform” wine-specific information.

Training Parameters

Parameter	Value
Epochs	3
Learning rate	2e-5 with cosine decay
Batch size	16 (effective)
Max sequence length	4096 tokens
Hardware	NVIDIA H100 80GB
Training time	~3-4 hours
Final loss	0.7218

Three epochs is standard for domain adaptation. More risks overfitting to the training set’s phrasings. The loss curve showed healthy convergence without signs of memorisation.

The Data Pipeline

Training data came from seven sources, each serving a different purpose:

Source	Records	Purpose
WineEnthusiast Reviews	130K	Professional tasting vocabulary and critic style
Alfredodeza Wine Ratings	33K	Structured review format
X-Wines (Kaggle)	1K+	Wine metadata and food pairing data
Vivino Ratings & Price	13.8K	Consumer perspective and pricing context
Wine Food Pairing NLP	~10K	Explicit pairing logic
Wikipedia Wine Articles	50+	Factual grounding for regions and varietals
Synthetic Q&A (Gemini)	45	High-quality conversation examples

Total: approximately 100K conversations after processing.

Why Synthetic Data?

The raw data skews heavily toward reviews. Reviews are useful for vocabulary, but they don’t model the conversational Q&A I wanted Sommo to handle.

I used Google Gemini to generate 45 synthetic conversations covering food pairing scenarios, wine knowledge questions, and recommendation requests. Each was manually reviewed for accuracy. 45 sounds small, but high-quality examples anchor the model’s conversational behaviour more effectively than thousands of low-quality ones.

Conversation Format

All data was converted to ChatML format with a system prompt that establishes Sommo’s persona:

SYSTEM = """You are Sommo, an expert sommelier with decades of experience
in wine selection, food pairing, and wine education. You have extensive
knowledge of wine regions worldwide, grape varieties and their characteristics,
winemaking techniques, and food pairing principles. You communicate in a warm,
knowledgeable manner - approachable for beginners yet sophisticated enough
for experts."""

The persona matters. Without it, the model defaults to generic assistant behaviour. With it, responses carry appropriate authority and warmth.

What Works

Sommo v1 handles its core tasks well:

Food pairing — Ask about pairing wine with duck confit, and it’ll recommend a Côtes du Rhône or Crozes-Hermitage, explaining that the Syrah’s peppery notes complement the richness of the duck. This is the kind of contextual reasoning that makes the model useful.

Comparative questions — “What’s the difference between Chablis and Chardonnay?” gets a clear explanation that Chablis is Chardonnay, but grown in a specific Burgundy region, typically unoaked, with mineral characteristics from Kimmeridgian limestone soils.

Budget-aware recommendations — The Vivino pricing data means Sommo understands price tiers. Ask for wines under £15, and you won’t get suggestions for £80 bottles.

Limitations (Honest Assessment)

This is a proof of concept. I’m documenting what doesn’t work because you should know before using it:

Hallucination Risk

The model will confidently invent wine facts. Specific appellations, vintage regulations, producer histories — anything that requires precise factual recall is suspect. I caught it inventing wine laws and attributing quotes to fictional critics.

This is inherent to how language models work. They’re pattern completers, not databases. Without retrieval-augmented generation (RAG) connected to a verified wine database, factual claims need external validation.

Outdated Recommendations

Training data has a time horizon. When Sommo recommends a 2018 vintage, it doesn’t know that vintage is now sold out. Real-time availability and pricing require infrastructure the model doesn’t have.

Missing Context

Sometimes responses describe wine characteristics without naming specific bottles. “You’d want something with bright acidity and stone fruit notes” is helpful but incomplete.

Production Considerations

For production use, I’d recommend:

RAG with a wine database — Ground responses in verified facts
Post-processing validation — Flag specific claims for verification
User disclaimers — Make clear this is an AI assistant, not a certified sommelier

The Sommo iOS App

v1 is open-source and available for anyone to use. But I also built something more polished.

Sommo is an iOS app powered by v2, an enhanced model with additional proprietary training data as well as MCP connections. The app adds features that make wine accessible in daily life:

Wine label scanning — Point your camera at a bottle, get instant tasting notes, food pairings, and serving temperature
Personalised recommendations — The app learns your preferences over time
Wine journal — Log bottles you’ve tried, add notes, track what you enjoyed
Wine education — Interactive lessons from fundamentals to advanced topics
Region explorer — Map of 200+ wine regions across 40+ countries

The app has a free tier with label scanning, journal, and basic lessons. Premium unlocks unlimited scans and all educational content.

Download Sommo on the App Store

Try It Yourself

Everything for v1 is open:

Model weights — The fine-tuned model is on HuggingFace:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gokhanarkan/sommo-7b-v1")
tokenizer = AutoTokenizer.from_pretrained("gokhanarkan/sommo-7b-v1")

Training notebook — The complete training pipeline is available as a Colab notebook. You can see exactly how the data was processed, how training was configured, and reproduce the results.

GitHub repository — Contains the notebook and documentation.

What I Learned

Building Sommo reinforced a few principles:

Domain adaptation works. A 7B model fine-tuned on 100K domain-specific conversations produces noticeably better results than the base model on domain tasks. This isn’t surprising, but it’s satisfying to verify.

Data quality beats data quantity. The 45 synthetic conversations had outsized impact on response style. Careful curation of a small dataset often matters more than scraping millions of low-quality examples.

Limitations matter more than capabilities. Users need to know where the model fails, not just where it succeeds. Hallucination risks, outdated information, missing context — documenting these honestly builds trust and prevents misuse.

Open-source enables feedback. Releasing v1 publicly lets others test, critique, and build on the work. The model’s limitations become visible through use in ways that internal testing can’t replicate.

Wine shouldn’t require an expensive education to enjoy. If Sommo helps someone pick a bottle they actually like, or understand why that Burgundy pairs well with mushrooms, it’s done its job.

Gökhan Arkan