Claude API — Geliştirici Referansı
Anthropic Claude API'sinin kurumsal kullanımı için hazırlanmış kapsamlı geliştirici notları. Model seçimi ve API temellerinden Tool Use, RAG, MCP ve agent mimarilerine kadar 11 bölüm, 75 kısa not. Kod örnekleri ve teknik terimler İngilizce orijinal haliyle korunmuştur.
1. Claude Modelleri ve Seçim
Claude model ailesi — Opus, Sonnet, Haiku — farklı öncelikler için optimize edilmiştir. Seçim çerçevesi ve ortak yetenekler.
Claude Modelleri Genel Bakış
Claude'un üç model ailesi farklı öncelikler için optimize edilmiştir:
Opus = karmaşık, çok adımlı görevler için en yüksek zeka modelidir; derin akıl yürütme ve planlama gerektirir. Değişim: daha yüksek maliyet ve gecikme.
Sonnet = iyi zeka, hız ve maliyet verimliliğine sahip dengeli modeldir. Güçlü kodlama yetenekleri ve kesin kod düzenleme. Çoğu pratik kullanım durumu için en iyisi.
Haiku = hız ve maliyet verimliliği için optimize edilmiş en hızlı modeldir. Opus/Sonnet gibi akıl yürütme yetenekleri yoktur. Gerçek zamanlı kullanıcı etkileşimleri ve yüksek hacimli işleme için en iyisi.
Selection framework: Zeka önceliği → Opus. Hız önceliği → Haiku. Dengeli gereksinimler → Sonnet.
Common approach = tek bir model seçimi yerine belirli görev gereksinimlerine dayalı olarak aynı uygulamada birden fazla model kullanın.
Tüm modeller ortak yetenekleri paylaşır: metin oluşturma, kodlama, görüntü analizi. Ana fark optimizasyon odağıdır.
2. API Temelleri
API'ye erişim akışı, istek yapısı, çok turlu konuşmalar, sistem mesajları, sıcaklık (temperature), streaming ve çıktı kontrolü.
API'ye Erişim
API Access Flow = 5-step process from user input to response display
Step 1: Client sends user text to developer's server (never access Anthropic API directly from client apps to keep API key secret)
Step 2: Server makes request to Anthropic API using SDK (Python, TypeScript, JavaScript, Go, Ruby) or plain HTTP. Required parameters = API key + model name + messages list + max_tokens limit
Step 3: Text generation process has 4 stages:
- Tokenization = breaking input into tokens (words/word parts/symbols/spaces)
- Embedding = converting tokens to number lists representing all possible word meanings
- Contextualization = adjusting embeddings based on neighboring tokens to determine precise meaning
- Generation = output layer produces probabilities for next word, model selects using probability + randomness, adds selected word, repeats process
Step 4: Model stops when max_tokens reached or special end_of_sequence token generated
Step 5: API returns response with generated text + usage counts + stop_reason to server, server sends to client for display
Token = text chunk (word/part/symbol)
Embedding = numerical representation of word meanings
Contextualization = meaning refinement using neighboring words
Max_tokens = generation length limit
Stop_reason = why model stopped generating
İstek Yapmak
Making API Request to Anthropic = Process involving 4 setup steps and understanding message structure
Setup Steps:
1. Install packages = pip install anthropic python-dotenv in Jupyter notebook
2. Store API key = Create .env file with ANTHROPIC_API_KEY="your_key" (ignore in version control)
3. Load environment variable = Use python-dotenv to securely load API key
4. Create client = Initialize anthropic client and define model variable (claude-3-sonnet)
API Request Structure:
- Function = client.messages.create()
- Required arguments = model, max_tokens, messages
- Model = Name of Claude model to use
- Max_tokens = Safety limit for generation length (not target length)
- Messages = List containing conversation exchanges
Message Types:
- User message = "role": "user", "content": "your text" (human-authored content)
- Assistant message = Contains model-generated responses
Response Access:
- Full response = Contains metadata and nested structure
- Text only = message.content[0].text extracts just generated text
Example request structure: client.messages.create(model=model, max_tokens=1000, messages=["role": "user", "content": "What is quantum computing?"])
Çok Turlu Konuşmalar
Çok Turlu Konuşmalar = conversations with multiple back-and-forth exchanges that maintain context.
Key limitation: Anthropic API stores no messages. Each request is independent with no memory of previous exchanges.
Solution requires two steps:
1. Manually maintain message list in code
2. Send entire conversation history with every follow-up request
Message structure = list of dictionaries with "role" (user/assistant) and "content" fields.
Conversation flow:
- Send initial user message
- Receive assistant response
- Append assistant response to message history
- Add new user message to history
- Send complete history for context-aware follow-up
Helper functions needed:
- add_user_message(messages, text) = appends user message to history
- add_assistant_message(messages, text) = appends assistant response to history
- chat(messages) = sends message history to API and returns response
Without message history = responses lack context and continuity. With complete history = Claude maintains conversation context and provides relevant follow-ups.
Sistem İstemleri
Sistem İstemleri = Claude'un yanıt stilini ve tonunu belirli bir rol veya davranış deseni atayarak özelleştirme tekniği.
Implementation = sistem istemini düz metin olarak sistem anahtar kelimesi bağımsız değişkenini kullanarak işlev oluşturmaya geçirin.
Purpose = Claude'un ne yanıt verdiğini değil, nasıl yanıt verdiğini kontrol edin. Example: math tutor role makes Claude give hints instead of direct answers.
Structure = ilk satır tipik olarak bir rol atanır ("You are a patient math tutor"), followed by specific behavioral instructions.
Key principle = sistem istekleri yanıt yaklaşımını yönlendirir, içeriği değil. Aynı soru, atanan role göre farklı işleme alır.
Technical implementation = params sözlüğü oluşturun, eğer istem sağlanmışsa koşullu olarak sistem anahtarını ekleyin, ** açmayı kullanarak params'ı oluştur işlevine geçirin. None durumunu sistem parametresini tamamen hariç tutarak ele alın.
Use case example = Tam çözümler yerine rehberlik/ipuçları veren ve doğrudan cevaplardan çok öğrenci düşünmesini teşvik eden matematik öğretmeni.
Temperature
Temperature = (0-1) arasında bir parametre, token seçim olasılıklarını etkileyerek Claude'un metin oluşturmadaki rastgeleliği kontrol eder.
Metin oluşturma işlemi: Giriş metni → tokenizasyon → olası sonraki token'lara olasılık ataması → olasılıklara dayalı token seçimi → tekrarlama.
Temperature effects:
- Temperature 0 = deterministik çıktı, her zaman en yüksek olasılıklı token'ı seçer
- Higher temperature = daha düşük olasılıklı token'ları seçme şansını artırır, daha yaratıcı/beklenmeyen çıktılar
Usage guidelines:
- Low temperature (near 0) = veri çıkarma, tutarlılık gerektiren çıplak görevler
- High temperature (near 1) = beyin fırtınası, yazı yazma, şakalar, pazarlama gibi yaratıcı görevler
Implementation: Model API çağrılarına sıcaklık parametresini ekleyin. Daha yüksek değerler farklı çıktıları garanti etmez, sadece değişim olasılığını artırır.
Key insight: Sıcaklık, sonraki token seçiminin olasılık dağılımını doğrudan manipüle ederek, yüksek olasılıklı token'ları seçim işleminde daha/daha az baskın hale getirir.
Yanıt Akışı
Yanıt Akışı = oluşturuldukça AI yanıtlarını parça parça göstermek için teknik, tam yanıtı beklemek yerine.
Problem solved: AI yanıtları 10-30 saniye sürebilir. Kullanıcılar anında geri bildirim bekler, sadece spinnerlar değil.
How it works:
1. Sunucu, kullanıcı mesajını Claude'a gönderir
2. Claude hemen ilk yanıtı gönderir (no text, just acknowledgment)
3. Olaylar akışı takip eder, her biri metin parçaları içerir
4. Sunucu parçaları gerçek zamanlı gösterim için frontend'e iletir
Event types:
- message_start = ilk onay
- content_block_start = metin oluşturma başlar
- content_block_delta = gerçek metin parçaları içerir (en önemli)
- content_block_stop/message_stop = oluşturma tamamlandı
Implementation:
Basic: client.messages.create(stream=True) olay yineleyicisini döndürür
Simplified: client.messages.stream() text_stream özelliği ile sadece metni çıkarır
Final message: stream.get_final_message() depolama için tüm parçaları birleştirir
Key benefits: Anında yanıt görünürlüğü yoluyla daha iyi kullanıcı deneyimi, veritabanı depolaması için eksiksiz ileti yakalama.
Controlling Model Output
**Controlling Model Output = Two key techniques beyond prompt modification**
**Pre-filling Assistant Messages = Manually adding assistant message at end of conversation to steer response direction**
How it works:
- Assemble messages list with user prompt + manual assistant message
- Claude sees assistant message as already authored content
- Claude continues response from exact end of pre-filled text
- Response gets steered toward pre-filled direction
Key point: Claude continues from exact endpoint of pre-fill, not complete sentences. Must stitch together pre-fill + generated response.
Example: Pre-fill "Coffee is better because" → Claude continues with justification for coffee
**Stop Sequences = Force Claude to halt generation when specific string appears**
How it works:
- Provide stop sequence string in chat function
- When Claude generates that exact string, response immediately stops
- Generated stop sequence text not included in final output
Example: Prompt "count 1 to 10" + stop sequence "five" → Output stops at "four, " (five not included)
Refinement: Stop sequence ", five" → Clean output "one, two, three, four"
Both techniques provide precise control over response direction and length without changing core prompts.
Structured Data
Structured Data Generation = technique using assistant message prefilling + stop sequences to get raw output without Claude's natural explanatory headers/footers.
Problem = Claude automatically adds markdown formatting, headers, commentary when generating JSON/code/structured content. Users often want just the raw data for copy/paste functionality.
Solution Pattern:
1. User message = request for structured data
2. Assistant message prefill = opening delimiter (e.g., "```json")
3. Stop sequence = closing delimiter (e.g., "```")
How it works = Claude sees prefilled message, assumes it already started response, generates only the requested content, stops when hitting delimiter.
Result = Raw structured data output with no extra formatting or commentary.
Application = Works for any structured data type (JSON, Python code, lists, etc.), not just JSON. Use whenever you need clean, parseable output without explanatory text.
Key benefit = Output can be directly used/copied without manual selection or parsing of unwanted text.
3. Prompt Engineering
Etkili prompt yazma prensipleri: açık ve net olmak, spesifik olmak, XML etiketleriyle yapı kurmak, örnek vermek (few-shot).
Prompt Engineering
Prompt Engineering = improving prompts to get more reliable, higher-quality outputs from language models.
Module Structure: Start with initial poor prompt → Apply prompt engineering techniques step-by-step → Evaluate improvements after each technique → Observe performance gains over time.
Example Goal: Generate one-day meal plan for athletes based on height, weight, physical goal, dietary restrictions.
Technical Setup:
- Updated eval pipeline with flexible prompt evaluator class
- Supports concurrency (adjust max_concurrent_tasks based on rate limits)
- generate_dataset() method creates test cases with specified inputs
- run_prompt() function processes each test case individually
Key Components:
- prompt_input_spec = dictionary defining required prompt inputs
- extra_criteria = additional validation requirements for model grading
- output.html = formatted evaluation report showing test case results and scores
Process: Write initial prompt → Interpolate test case inputs → Run evaluation → Apply engineering techniques → Re-evaluate → Repeat until satisfactory performance.
Initial Results: Expect poor scores (example: 2.32) with basic prompts, especially when using less capable models. Scores improve as techniques are applied.
Net ve Doğrudan Olmak
Net ve Doğrudan Olmak = İsteklerin ilk satırında basit, doğrudan bir dil ve eylem fiilleri kullanarak kesin görevi belirtin.
First line importance = AI yanıtı için temeli oluşturan istem'in en kritik kısmı.
Structure = Eylem fiili + net görev açıklaması + çıktı spesifikasyonları.
Examples:
- "Write three paragraphs about how solar panels work"
- "Identify three countries that use geothermal energy and for each include generation stats"
- "Generate a one day meal plan for an athlete that meets their dietary restrictions"
Key components = Action verb at start + direct task statement + expected output details.
Result = Geliştirilmiş istem performansı (örnek 2,32'den 3,92'ye skor artışı gösterdi).
Spesifik Olmak
Spesifik Olmak = model çıktısını belirli bir yöne yönlendirmek için yönergeler veya adımlar ekleme
Two types of guidelines:
Type A (Attributes) = list qualities/attributes desired in output (length, structure, format)
Type B (Steps) = provide specific steps for model to follow in reasoning process
Tür A çıktı özelliklerini kontrol eder. Tür B modelin cevaba nasıl vardığını kontrol eder.
Both techniques often combined in professional prompts.
When to use:
- Tür A (özellikler): neredeyse tüm isteklere önerilir
- Tür B (adımlar): modelin daha geniş bir perspektif düşünmesini istediğiniz karmaşık problemler için kullanın or additional viewpoints it might not naturally consider
Example improvement: yönergelerin eklenmesiyle beslenme planlama isteminin skoru 3,92'den 7,86'ya sıçradı ve spesifiklik yoluyla önemli kalite iyileştirmesini gösterdi.
Structure with XML Tags
İstem Yapısı için XML Etiketleri = AI anlayışını geliştirmek için istekler içinde farklı içerik bölümlerini organize etmek ve sınırlandırmak için XML etiketleri kullanma.
Purpose = İsteklere büyük miktarda içerik enterpolasyon yaparken, XML etiketleri AI modellerine farklı bilgi türlerini ayırt etmelerine ve metin gruplandırmasını anlamalarına yardımcı olur.
Implementation = İçerik bölümlerini açıklayıcı XML etiketleriyle sarın like <sales_records></sales_records> or <my_code></my_code> rather than dumping unstructured text.
Tag naming = Açıklayıcı, spesifik etiket adları kullanın (e.g., "sales_records" better than "data") to provide context about content nature.
Example use case = Debugging prompt with mixed code and documentation becomes clearer when separated into <my_code> and <docs> tags.
Benefits = İstem yapısını AI'ye açık hale getirir, içerik sınırları hakkındaki kafa karışıklığını azaltır, çıktı kalitesini iyileştirir even for smaller content blocks.
Application = Can wrap any interpolated content like <athlete_information> even when content is short, to clarify it's external input requiring consideration.
Örnek Sağlama
One-shot/Multi-shot prompting = model davranışını yönlendirmek için isteklerde örnek sağlama. Tek-shot = tek örnek, çoklu-shot = birden fazla örnek.
Implementation: Örnek girişi ve ideal çıktısı içeren XML etiketleriyle örnekleri yapılandırın. Gerçek istem içeriğinden ayırt etmek için örnekleri her zaman açıkça sarın.
Key applications:
- Köşe durum işleme (sarcasm tespiti, uç senaryolar)
- Karmaşık çıktı biçimlendirmesi (JSON yapıları, belirli biçimler)
- Beklenen yanıt kalitesi/stili açıklama
Best practices:
- Köşe durumları için bağlam ekleyin ("be especially careful with sarcasm")
- Çıktının neden ideal olduğunu açıklayan akıl yürütme ekleyin
- İstem değerlendirmelerinden en yüksek puan alan örnekleri şablon olarak kullanın
- Örnekleri ana talimatlar/yönergelerin sonrasına yerleştirin
Effectiveness boost: İstenilen çıktı özelliklerini güçlendirmek için örnekleri bunların ideal olmasını sağlayan açıklamalarla birleştirin.
4. Prompt Değerlendirme (Eval)
Prompt kalitesinin ölçülmesi: test dataset'leri üretmek, eval çalıştırmak, model-based ve code-based grading.
Prompt Evaluation
Prompt Engineering = techniques for writing/editing prompts to help Claude understand requests and desired responses.
Prompt Evaluation = etkinliği ölçmek için nesnel ölçümleri kullanarak isteklerin otomatik test edilmesi.
Bir istem yazdıktan sonra üç yol:
1. Bir veya iki kez test edin, üretime alın (tuzak)
2. Özel girişlerle test edin, köşe durumları için küçük ayarlamalar (tuzak)
3. Nesnel puanlama için değerlendirme boru hattından geçirin (önerilir)
Key takeaway: Mühendisler genel olarak istekleri yetersiz test ederler. İstekleri yinelemeden ve dağıtmadan önce nesnel performans puanları almak için değerlendirme boru hatları kullanın.
A Typical Eval Workflow
Typical Eval Workflow = istem iyileştirmesi için 6 adımlı yinelemeli bir işlem
Step 1: İlk istem taslağını yazın - optimize etmek için temel istem oluşturun
Step 2: Değerlendirme veri seti oluşturun - test girişlerinin koleksiyonu (3 örnek veya binlerce olabilir, el yazısı veya LLM tarafından oluşturulan)
Step 3: İstem varyasyonları oluşturun - her veri seti girişini istem şablonuna enterpolasyon yapın
Step 4: LLM yanıtları alın - her istem varyasyonunu Claude'a besleyin, çıktıları toplayın
Step 5: Yanıtları derecelendirin - her yanıtı puanlamak için derecelendirme sistemi kullanın (e.g. 1-10 scale), genel istem performansı için ortalama puanlar
Step 6: Yineleyin - puanlara göre istemi değiştirin, tüm işlemi tekrarlayın, sürümleri karşılaştırın
Key points: Standart bir metodoloji yoktur. Birçok açık kaynak/ücretli araç mevcuttur. Özel uygulama ile basit başlayabilirsiniz. Derecelendirme karmaşıklığı değişir. Nesnel puanlama, A/B karşılaştırması yoluyla sistematik istem iyileştirmesini sağlar.
Generating Test Datasets
Özel istem değerlendirme iş akışı = istem oluşturun + test veri seti oluşturun + performansı değerlendirin
Goal = sadece Python, JSON yapılandırması veya açıklamalar olmadan regex çıktı veren AWS kod yardımı istemi
Dataset generation approaches = manuel montaj veya Claude ile otomatik (üretim için Haiku gibi daha hızlı modelleri kullanın)
Dataset structure = kullanıcı isteklerini açıklayan görev özelliğine sahip JSON nesnelerinin dizisi
Generation process = test durumları oluşturmak için Claude'a istekte bulunun → asistan iletisi ile ön doldurma kullanın "```json" → durdurma dizisini ayarlayın "```" → yanıtı JSON olarak ayrıştırın → dosyaya kaydetin
Key implementation = Claude'a istem gönderer generate_dataset() işlevi, test görevlerinin yapılandırılmış JSON yanıtını alır, dataset.json dosyasına kaydeder for later evaluation use
Test veri seti, performans tutarlılığını ölçmek için istemi birden fazla giriş senaryosuna karşı çalıştırarak sistematik değerlendirmeyi sağlar.
Running the Eval
Eval yürütme işlemi = test durumlarını isteklerle birleştirme, LLM'den geçirme ve çıktıları derecelendirme.
Test case = veri setinden tek bir kayıt (JSON nesnesi).
Üç temel işlev:
- run_prompt = test durumunu istekle birleştirir, Claude'a gönderir, çıktı döndürür
- run_test_case = run_prompt'u çağırır, sonucu derecelendirir, özet sözlüğü döndürür
- run_eval = veri seti boyunca döner, her biri için run_test_case'i çağırır, sonuçları birleştirir
Temel istem yapısı = "Lütfen aşağıdaki görevi çözün: [test_case_task]" (v1 başlama noktası).
Mevcut sınırlamalar = çıktı biçimlendirme talimatları yok, sabit kodlanmış puanlama (skor=10), ayrıntılı Claude yanıtları.
Çalışma süresi = tam veri seti yürütmesi için Haiku modeli ile ~31 saniye.
Çıktı biçimi = Claude çıktısı, orijinal test durumu ve puan içeren nesnelerin dizisi.
Sonraki adım = sabit kodlanmış puanları değiştirmek için uygun bir derecelendirme sistemi uygulayın.
Eval boru hattı çekirdeği = veri seti + istem + LLM + derecelendirici, minimal kod karmaşıklığıyla.
Model Tabanlı Derecelendirme
Model Tabanlı Derecelendirme = model çıktılarını alır ve nesnel puanlar atayan değerlendirme sistemi (tipik olarak 1-10 ölçeği, 10 = en yüksek kalite)
Üç derecelendirici türü:
- Kod derecelendircileri = programlı kontroller (uzunluk, kelime varlığı, sözdizimi doğrulaması, okunabilirlik puanları)
- Model derecelendircileri = orijinal model çıktısını değerlendirmek için ek API çağrısı, kalite/talimat takibi değerlendirmesi için oldukça esnek
- İnsan derecelendircileri = kişi yanıtları değerlendirir, en esnek ancak zaman alıcı ve sıkıcı
Temel gereksinimler: Nesnel bir sinyal döndürmelidir (genellikle sayısal puan). Değerlendirme kriterlerini önceden tanımlayın.
Model derecelendircileri için uygulama deseni:
- güçlü yönler/zayıf yönler/akıl yürütme/puan isteyen ayrıntılı istem oluşturun (varsayılan orta puanlardan kaçınmak için sadece puan değil)
- önceden doldurulmuş asistan iletisi ve durdurma dizileriyle JSON yanıt biçimini kullanın
- puan ve akıl yürütme için döndürülen JSON'u ayrıştırın
- son ölçüm için test durumları arasında ortalama puanları hesaplayın
Model derecelendircileri yüksek esneklik sunar ancak tutarsız olabilir. Yine de istem optimizasyonu için nesnel bir temel sağlar.
Kod Tabanlı Derecelendirme
Kod Tabanlı Derecelendirme = kod, JSON veya regex içeren LLM çıktıları için otomatik doğrulama sistemi
Temel Uygulama:
- validate_json() = JSON ayrıştırmasını dener, geçerli ise 10, hata ise 0 döndürür
- validate_python() = AST ayrıştırmasını dener, geçerli ise 10, hata ise 0 döndürür
- validate_regex() = regex derlemesini dener, geçerli ise 10, hata ise 0 döndürür
Veri Seti Gereksinimleri:
- beklenen çıktı türünü belirten "format" anahtarını içermeli (JSON/Python/RegEx)
- Otomatik veri seti oluşturması için istem şablonu değişikliği yoluyla güncellendi
Prompt Engineering:
- modeli sadece ham kod/JSON/regex ile yanıt vermesi için talimatlı
- Yorum, açıklama veya yorum yok
- ```code``` blokları ile önceden doldurulmuş Asistan iletisini kullanın
- temiz çıktı çıkarmak için durdurma dizileri ekleyin
Puanlama Sistemi:
- Son puan = (model_skoru + sözdizimi_skoru) / 2
- Anlamsal değerlendirmeyi sözdizimi doğrulaması ile birleştirir
- Hem doğruluk hem de teknik geçerliliğin ölçülmesini sağlar
Key Limitation = uygun derecelendirici seçimi için bilinen beklenen biçimi gerektirir
5. Tool Use (Araç Kullanımı)
Claude'un harici fonksiyonları çağırma yeteneği: tool şemaları, message block'lar, tool result'ların gönderimi, çoklu araçlar.
Tool Use'a Giriş
Tool use = Claude'un eğitim verilerinin ötesinde harici bilgilere erişmesi için yöntem.
Default limitation: Claude sadece eğitim verilerinden bilgi bilir, güncel/gerçek zamanlı bilgilerden yoksundur.
Tool use akışı:
1. Claude'a başlangıç istemi gönderin + harici veri erişimi için talimatlar
2. Claude harici verinin gerekli olup olmadığını değerlendirir, spesifik bilgi talep eder
3. Sunucu, talep edilen verileri harici kaynaklardan almak için kodu çalıştırır
4. Claude'a alınan verilerle takip istemi gönderin
5. Claude, orijinal istem + harici veriler kullanarak son yanıtı oluşturur
Hava durumu örneği: Kullanıcı güncel hava durumunu sorar → Claude hava durumu verisi talep eder → Sunucu hava durumu API'sini çağırır → Claude hava durumu verilerini alır → Claude bilgili hava durumu yanıtı sağlar.
Key concept: Tools enable Claude to augment responses with live/current information by orchestrating external data retrieval between Claude's requests.
Project Overview
**Project Overview**
Goal = Teach Claude to set time-based reminders through tool implementation in Jupyter notebook
Target interaction = User: "Set reminder for doctor's appointment, week from Thursday" → Claude: "I will remind you at that point in time"
**Three core problems requiring tools:**
1. Time knowledge gap = Claude knows current date but not exact time
2. Time calculation errors = Claude sometimes miscalculates time-based addition (e.g., 379 days from January 13th, 1973)
3. No reminder mechanism = Claude understands reminder concept but lacks implementation capability
**Three corresponding tools to build:**
1. Current datetime tool = Gets current date + time
2. Duration addition tool = Adds time duration to datetime (e.g., current date + 20 days)
3. Reminder setting tool = Actually sets the reminder
Implementation approach = One tool at a time, building toward multi-tool coordination
Tool Functions
Tool Functions = Python functions executed automatically when Claude needs extra information to help users.
Key characteristics:
- Plain Python functions called by Claude when it determines additional data is needed
- Must use descriptive function names and argument names
- Should validate inputs and raise errors with meaningful messages
- Error messages are visible to Claude, allowing it to retry with corrected parameters
Best practices:
1. Well-named functions and arguments
2. Input validation with immediate error raising for invalid inputs
3. Meaningful error messages that guide correction
Example implementation pattern:
```
def get_current_datetime(date_format="%Y%m%d %H:%M:%S"):
if not date_format:
raise ValueError("date format cannot be empty")
return datetime.now().strftime(date_format)
```
Tool function workflow: Claude identifies need for information → calls tool function → receives result or error → may retry with corrections if error occurred.
Purpose: Extend Claude's capabilities beyond its training data by providing access to real-time information like current datetime, weather, etc.
Araç Şemaları
Araç Şemaları = JSON schema specifications that describe tool functions and their parameters for language models
JSON Schema = data validation specification (not ML-specific) used to validate JSON data, adopted by ML community for tool calling
Tool Schema Structure:
- name: tool identifier
- description: 3-4 sentences explaining what tool does, when to use, what data it returns
- input_schema: actual JSON schema describing function arguments with types and descriptions
Schema Generation Trick:
1. Take tool function to Claude.ai
2. Prompt: "write valid JSON schema spec for tool calling for this function, follow best practices in attached documentation"
3. Attach Anthropic API documentation tool use page
4. Copy generated schema
Implementation Pattern:
- Name functions descriptively
- Name schemas as [function_name]_schema
- Import ToolParam from anthropic.types
- Wrap schema dictionary with ToolParam() to prevent type errors
Purpose = inform Claude about available tools, required arguments, and usage context through standardized JSON validation format
Handling Message Blocks
**Tool-Enabled Claude Requests**
Step 3: Making requests to Claude with tools = include tool schema in request alongside user message using `tools` keyword argument containing JSON schema specs.
**Multi-Block Messages**
Content structure change = messages now contain multiple blocks instead of just text blocks.
Tool response format = assistant message with:
- Text block = user-facing explanation
- Tool use block = contains function name + arguments for tool execution
**Message History Management**
Critical requirement = manually maintain conversation history since Claude stores nothing.
Multi-block handling = append entire response.content (all blocks) to messages list, not just text.
Helper function updates needed = add_user_message and add_assistant_message functions must support multiple blocks instead of single text blocks only.
Conversation flow = user message → assistant response with tool use block → execute tool → respond back to Claude with full history.
Araç Sonuçlarını Gönderme
Tool Results = Results from executed tool functions sent back to Claude in follow-up requests.
Process: Execute tool function requested by Claude → Create tool result block → Send follow-up request with full conversation history.
Tool Result Block Structure:
- tool_use_id = Matches ID from original tool use block to pair requests with results
- content = Tool function output converted to string (usually JSON)
- is_error = Boolean flag for function execution errors (default false)
Tool Use ID Purpose = Links multiple tool requests to correct results when Claude makes simultaneous tool calls. Each tool use gets unique ID, tool results must reference matching IDs.
Follow-up Request Requirements:
- Include complete message history (original user message + assistant tool use message + new user message with tool result)
- Must include original tool schemas even if not using tools again
- Tool result block goes in user message, not assistant message
Conversation Flow: User request → Claude assistant response (text + tool use blocks) → Server executes tool → User message with tool result block → Claude final response with integrated results.
Multi-Turn Conversations with Tools
Multi-Turn Tool Conversations = conversations where Claude uses multiple tools sequentially to answer a single user query.
Tool Chaining Process = user asks question → Claude requests first tool → tool executed → result returned → Claude requests second tool → tool executed → result returned → Claude provides final answer.
Example Flow = user asks "what day is 103 days from today" → Claude calls get_current_datetime → Claude calls add_duration_to_datetime → Claude provides answer.
Implementation Pattern = while loop that continues calling Claude until no more tool requests, checking each response for tool_use blocks.
run_conversation Function = takes initial messages, loops through Claude calls, executes requested tools, adds results to conversation, continues until final response.
Required Refactors:
- add_user_message/add_assistant_message = updated to handle multiple message blocks instead of just plain text
- chat function = accepts tools parameter, returns entire message instead of just first text block
- text_from_message helper = extracts all text blocks from a message with multiple content blocks
Key Insight = can't predict how many tools user queries will require, so system must handle arbitrary chains of tool calls automatically.
Implementing Multiple Turns
**Multiple Turns Implementation = continuously calling Claude until it stops requesting tools**
**Stop Reason Field = indicates why Claude stopped generating text**
- stop_reason = "tool_use" means Claude wants to call a tool
- Other values exist but tool_use is most commonly checked
**run_conversation Function = main loop that:**
1. Calls Claude with messages + available tools
2. Adds assistant response to conversation history
3. Checks stop_reason - if not "tool_use", breaks loop
4. If tool_use, calls run_tools function
5. Adds tool results as user message
6. Repeats until no more tool requests
**run_tools Function = processes multiple tool use blocks:**
1. Filters message.content for blocks with type="tool_use"
2. Iterates through each tool request
3. Runs appropriate tool function via run_tool helper
4. Creates tool_result blocks with: type="tool_result", tool_use_id=original_id, content=JSON_encoded_output, is_error=boolean
5. Returns list of all tool result blocks
**run_tool Function = dispatcher that:**
- Takes tool_name and tool_input
- Uses if statements to match tool names to functions
- Executes appropriate tool function
- Scalable for adding multiple tools
**Error Handling = try/except blocks around tool execution:**
- Success: is_error=false, content=tool_output
- Failure: is_error=true, content=error_message
**Key Architecture Points:**
- Assistant messages can contain multiple blocks (text + multiple tool_use)
- Each tool_use block gets separate tool_result response
- Tool results sent back as user message containing all results
- Process repeats until Claude provides final text-only response
Using Multiple Tools
Multiple Tools Implementation = Adding additional tools to an existing tool system after initial framework setup.
Process = 3 steps: (1) Add tool schemas to RunConversation function's tools list, (2) Add conditional cases in RunTool function to handle new tool names, (3) Implement actual tool functions.
Key Components:
- RunConversation function = Contains tools list that makes Claude aware of available tools
- RunTool function = Routes tool calls to appropriate functions based on tool name
- Tool schemas = Define tool structure for the AI model
- Tool functions = Actual implementation code
Example Tools Added:
- AddDurationToDateTime = Calculates date/time with duration offset
- SetReminder = Creates reminder (mock implementation that prints confirmation)
Tool Chaining = AI can use multiple tools sequentially in single conversation (e.g., calculate date first, then set reminder with result).
Message Structure = Assistant responses can contain multiple blocks: text blocks + tool use blocks in same message.
Scalability = After initial framework setup, adding new tools becomes simple pattern of schema + routing + implementation.
The Batch Tool
Batch Tool = tool that enables Claude to run multiple tools in parallel within a single Assistant message instead of making separate sequential requests.
Problem: Claude can technically send multiple tool use blocks in one message but rarely does so in practice, leading to unnecessary sequential tool calls.
Solution: Create batch tool schema that takes list of invocations (each containing tool name + arguments). Instead of calling tools directly, Claude calls batch tool with array of desired tool executions.
Implementation:
- Add batch tool to schema with invocations parameter
- Create run_batch function that iterates through invocations list
- Extract tool name and JSON-parsed arguments from each invocation
- Call run_tool function for each requested tool
- Return batch_output list containing results from all tool executions
Mechanism: Tricks Claude into parallel tool execution by providing higher-level abstraction that manually handles what multiple tool use blocks would accomplish automatically.
Result: Single request-response cycle instead of multiple sequential rounds for parallel-executable tasks.
Tools for Structured Data
Tools for Structured Data = alternative method to extract structured JSON from data sources using Claude's tool system instead of message pre-fill and stop sequences.
Key differences from prompt-based extraction:
- More reliable output
- More complex setup
- Requires JSON schema specification
Core Process:
1. Define JSON schema for tool where inputs = desired data structure
2. Send prompt + schema to Claude
3. Claude calls tool with structured arguments matching schema
4. Extract JSON from tool use block (no tool result needed)
Critical requirement = Force tool calling using tool_choice parameter:
- tool_choice = "type": "tool", "name": "your_tool_name"
- Ensures Claude always calls specified tool
Implementation steps:
1. Create schema definition for extraction tool
2. Update chat function to accept tool_choice parameter
3. Pass tool_choice to client.messages.create()
4. Access structured data from response.content[0].input
Use cases = When reliability more important than simplicity. Prompt-based methods better for quick/simple extractions, tools better for complex/reliable extractions.
The Text Edit Tool
Text Editor Tool = built-in Claude tool for file/text operations (read, write, create, replace, undo files/directories)
Key characteristics:
- Only JSON schema built into Claude, implementation must be custom-coded
- Schema stub sent to Claude gets auto-expanded to full schema
- Schema type string varies by Claude model version (3.5 vs 3.7 have different dates)
- Enables Claude to act as software engineer out-of-the-box
Required implementation:
- Custom class/functions to handle Claude's tool use requests
- Functions for: view files, string replace, create files, etc.
- Actual file system operations not provided by Claude
Workflow:
1. Send minimal schema stub to Claude (name + type with version-specific date)
2. Claude expands to full schema internally
3. Claude sends tool use requests
4. Custom implementation executes actual file operations
5. Results sent back to Claude
Use cases:
- Replicate AI code editor functionality
- File system operations where native editors unavailable
- Automated code generation/refactoring
- Multi-file project manipulation
Benefits = approximates fancy code editor capabilities through API calls rather than GUI interaction.
The Web Search Tool
Web Search Tool = built-in Claude tool for searching web to find up-to-date/specialized information for user questions
Implementation = no custom code needed, Claude handles search execution automatically
Schema Requirements:
- type: "web_search_20250305"
- name: "web_search"
- max_uses: number (limits total searches, default 5)
- allowed_domains: optional list to restrict search to specific domains
Response Structure:
- Text blocks = Claude's explanatory text
- Tool use blocks = search queries Claude executed
- Web search result blocks = found pages (title, URL)
- Citation blocks = specific text supporting Claude's statements
Key Features:
- Multiple searches possible per request (up to max_uses limit)
- Domain restriction available for quality control
- Citation system links statements to source material
UI Rendering Pattern:
- Display text blocks as normal text
- Show search results as reference list
- Highlight citations with source attribution (domain, title, URL, quoted text)
Use Case Example: Restricting to NIH.gov for medical/exercise advice ensures scientifically-backed information vs generic web content.
6. Geri Kazanma Augmented Generation (RAG)
Harici bilgi tabanlarıyla çalışma: chunking stratejileri, embeddings, BM25, reranking ve contextual retrieval.
Introducing Geri Kazanma Augmented Generation
RAG = Geri Kazanma Augmented Generation technique for querying large documents using language models.
Problem: How to extract specific information from large documents (100-1000+ pages) using Claude without hitting context limits.
Option 1 (Direct approach): Place entire document text directly into prompt.
- Limitations: Hard token limits, decreased effectiveness with longer prompts, higher costs, slower processing
Option 2 (RAG approach): Two-step process
- Step 1: Break document into small chunks
- Step 2: For user questions, find most relevant chunks and include only those in prompt
RAG benefits: Model focuses on relevant content, scales to large/multiple documents, smaller prompts, lower costs, faster processing
RAG downsides: More complexity, requires preprocessing, needs search mechanism to find relevant chunks, no guarantee chunks contain complete context, multiple chunking strategies possible (equal portions vs header-based)
Key challenge: Defining relevance and optimal chunking strategy for specific use cases.
RAG trades simplicity for scalability and efficiency but requires careful implementation and evaluation.
Text Chunking Strategies
Text Chunking Strategies = process of dividing documents into smaller pieces for RAG pipelines
Core Problem: Chunking quality directly impacts RAG performance. Poor chunking leads to irrelevant context retrieval (e.g., medical "bug" text retrieved for software engineering query about bugs).
Three Main Strategies:
1. Size-Based Chunking = dividing text into equal-length strings
- Pros: Easy to implement, most common in production
- Cons: Cut-off words, lacks context
- Solution: Overlap strategy = include characters from neighboring chunks to preserve context
- Trade-off: Creates text duplication but improves chunk meaning
2. Structure-Based Chunking = dividing based on document structure (headers, paragraphs, sections)
- Best for structured documents (markdown, HTML)
- Limitation: Requires guaranteed document formatting
- Example: Split on markdown headers (##) to create section-based chunks
3. Semantic-Based Chunking = using NLP to group related sentences/sections
- Most advanced technique
- Groups consecutive sentences based on semantic similarity
- Complex implementation
Key Implementation Notes:
- Chunk by character = most reliable fallback, works with any document type
- Chunk by sentence = good middle ground if sentence detection works reliably
- Chunk by section = optimal results but requires structured input
- Strategy choice depends on document type guarantees and use case requirements
Rule: No universal best chunking method - depends on document structure guarantees and specific use case.
Text Embeddings
Text Embeddings = numerical representation of text meaning generated by embedding models
Embedding Model = takes text input, outputs long list of numbers (range -1 to +1)
Embedding Numbers = scores representing unknown qualities/features of input text. Each number theoretically scores different aspects (happiness, topic relevance, etc.) but actual meaning is unknown to users.
Semantic Search = uses text embeddings to find text chunks related to user questions in RAG pipelines. Solves the search problem of matching user queries to relevant document chunks.
RAG Pipeline Process = extract text chunks → user submits query → find related chunks using semantic search → add relevant chunks as context to prompt
Implementation = Anthropic recommends Voyage AI for embedding generation. Requires separate account/API key. Free to start, easy integration via SDK.
Key Insight = Embeddings enable semantic similarity matching rather than keyword matching, allowing better understanding of text relationships for retrieval tasks.
The Full RAG Flow
RAG Flow = 7-step process combining text chunking, embeddings, and vector search to retrieve relevant context for LLM queries.
Step 1: Text Chunking = Split source documents into separate text pieces
Step 2: Generate Embeddings = Convert text chunks into numerical vectors using embedding models
Step 3: Normalization = Scale vector magnitudes to 1.0 (handled automatically by embedding APIs)
Step 4: Vector Database Storage = Store embeddings in specialized database optimized for numerical vector operations
Step 5: Query Processing = Convert user question into embedding using same model
Step 6: Similarity Search = Find most similar stored embeddings using cosine similarity calculation
Step 7: Prompt Assembly = Combine user question with retrieved relevant text chunks, send to LLM
Key Math Concepts:
- Cosine Similarity = cosine of angle between vectors, returns values -1 to 1, closer to 1 means more similar
- Cosine Distance = 1 minus cosine similarity, values closer to 0 mean higher similarity
- Vector Database = performs similarity calculations to find closest matching embeddings
Process Flow: Pre-processing (steps 1-4) → User Query → Real-time retrieval (steps 5-7) → LLM Response
Implementing the Rag Flow
RAG Flow Implementation = practical walkthrough of 5-step retrieval-augmented generation process
Step 1: Text Chunking = split document into sections using chunk_by_section function on report.MD file
Step 2: Embedding Generation = create vector representations for each chunk using generate_embedding function (supports single string or list of strings input)
Step 3: Vector Store Population = create vector index instance, loop through chunk-embedding pairs using zip(), store each pair with store.add_vector(embedding, content: chunk). Store original text with embeddings for meaningful retrieval results.
Step 4: Query Processing = user asks question "what did software engineering department do last year", generate embedding for user query
Step 5: Similarity Search = use store.search(user_embedding, 2) to find 2 most relevant chunks, returns results with cosine distances (0.71 for section two, 0.72 for methodology section)
Key Components:
- Vector Index Class = custom vector database implementation
- Cosine Distance = similarity metric between query and stored embeddings
- Metadata Storage = storing original text content alongside embeddings enables meaningful retrieval
Workflow complete but has limitations requiring further improvements.
BM25 Lexical Search
BM25 = Best Match 25, a lexical search algorithm commonly used in RAG pipelines to complement semantic search.
Problem with semantic search alone = Can miss exact term matches, returning irrelevant results even when specific terms appear frequently in certain documents.
Hybrid search approach = Combines semantic search (embeddings/vector database) with lexical search (BM25) in parallel, then merges results for better balance.
BM25 algorithm steps:
1. Tokenize user query into separate terms (remove punctuation, split on spaces)
2. Count frequency of each term across all text chunks/documents
3. Assign relative importance to terms based on usage frequency (rare terms = higher importance, common terms like "a" = lower importance)
4. Rank text chunks by how often they contain higher-weighted terms
Key insight = Frequently used terms across corpus are less important for search relevance than rare, specific terms.
BM25 advantages = Better at finding exact term matches, prioritizes documents containing rare/specific search terms, complements semantic search weaknesses.
Implementation = Both semantic and lexical search systems use similar APIs (add_document, search functions) making them easy to combine.
Sonraki adım = Merge results from both search systems to get benefits of semantic understanding plus exact term matching.
A Multi-Index Rag Pipeline
Multi-Index RAG Pipeline = system combining semantic search (vector index) and lexical search (BM25 index) for improved retrieval accuracy.
Key Components:
- Vector Index = semantic similarity search using embeddings
- BM25 Index = lexical/keyword-based search
- Retriever Class = wrapper that forwards queries to both indexes and merges results
Reciprocal Rank Fusion = technique for merging search results from different indexes. Formula: RRF_score = sum of (1/(rank + 1)) across all search methods for each document. Documents ranked by highest combined score.
Example: Vector search returns [doc2, doc7, doc6], BM25 returns [doc6, doc2, doc7]. After RRF calculation, final ranking becomes [doc2, doc6, doc7] because doc2 ranked high in both methods.
Benefits:
- Improved search accuracy by combining different search paradigms
- Modular design with standardized API (search() and add_document() methods)
- Easy to extend with additional search indexes
- Better handling of edge cases where single method fails
Implementation pattern allows multiple search methodologies to work together while maintaining separate, isolated index classes.
Reranking Results
Reranking = post-processing step that uses LLM to reorder search results by relevance after initial retrieval.
Process: Run vector + BM25 search → merge results → pass to LLM with prompt asking to rank documents by relevance → get reordered results.
Implementation details: Use document IDs instead of full text for efficiency. LLM receives user query + candidate documents + instruction to return most relevant docs in decreasing order. Assistant message pre-fill + stop sequence ensures structured JSON output.
Tradeoffs: Increases search accuracy by leveraging LLM's understanding of semantic relevance. Increases latency due to additional LLM call. Particularly effective when initial retrieval methods miss nuanced query intent (e.g., "ENG team" vs "engineering team").
Example improvement: Query "What did engineering team do with incident 2023?" correctly prioritized software engineering section over cybersecurity section after reranking, despite hybrid search initially ranking it lower.
Contextual Geri Kazanma
Contextual Geri Kazanma = technique to improve RAG pipeline accuracy by adding context to document chunks before embedding.
Problem: When documents are split into chunks, individual chunks lose context from the original document, reducing retrieval accuracy.
Solution: Pre-processing step that adds contextual information to each chunk before inserting into retriever database.
Process:
1. Take individual chunk + original source document
2. Send to LLM (Claude) with prompt asking to generate situating context
3. LLM generates brief context explaining chunk's relationship to larger document
4. Join generated context with original chunk = "contextualized chunk"
5. Use contextualized chunk as input to vector/BM25 indexes
Large Document Handling: If source document too large for single prompt, use selective context strategy:
- Include starter chunks (1-3) from document beginning for summary/abstract
- Include chunks immediately before target chunk for local context
- Skip middle chunks that provide less relevant context
Implementation: add_context function takes text chunk + source text, generates context via LLM, concatenates context with original chunk, returns contextualized version.
Benefit: Chunks retain ties to larger document structure and cross-references, improving retrieval accuracy for complex documents with interconnected sections.
7. Gelişmiş Özellikler
Extended thinking (uzatılmış düşünme), görüntü ve PDF desteği, citations, prompt caching ve Code Execution + Files API.
Extended Thinking
Extended Thinking = Claude feature that allows reasoning time before generating final response
Key mechanics:
- Displays separate thinking process visible to users
- Increases accuracy for complex tasks but adds cost (charged for thinking tokens) and latency
- Thinking budget = minimum 1024 tokens allocated for thinking phase
- Max tokens must exceed thinking budget (e.g., budget 1024 requires max_tokens ≥ 1025)
When to use:
- Enable after prompt optimization fails to achieve desired accuracy
- Use prompt evals to determine necessity
Response structure:
- Thinking block = contains reasoning text + cryptographic signature
- Text block = final response
- Signature = prevents tampering with thinking text (safety measure)
Special cases:
- Redacted thinking blocks = encrypted thinking text flagged by safety systems
- Provided for conversation continuity without losing context
- Can force redacted blocks using test string: "entropic magic string triggered redacted thinking [special characters]"
Implementation:
- Set thinking=true and thinking_budget parameter
- Ensure max_tokens > thinking_budget for adequate response generation capacity
Image Support
Claude Görüntü Analizi Yetenekleri = ability to process images within user messages for analysis, comparison, counting, and description tasks.
Image Limitations:
- Max 100 images per request
- Size/dimension restrictions apply
- Images consume tokens (charged based on pixel height/width calculation)
Image Block Structure = special block type within user messages that holds either raw image data (base64) or URL reference to online image. Multiple image blocks allowed per message.
Critical Success Factor = strong prompting techniques required for accurate results. Simple prompts often fail.
Prompting Techniques for Images:
- Step-by-step analysis instructions
- One-shot/multi-shot examples (alternating image and text pairs)
- Clear guidelines and verification steps
- Structured analysis frameworks
Example Use Case = automated fire risk assessment from satellite imagery analyzing tree density, property access, roof overhang, and assigning numerical risk scores.
Implementation = base64 encode image data, create message with image block (type: image, source: base64, media_type, data) followed by text block containing detailed prompt instructions.
Key Takeaway = image accuracy depends entirely on prompt sophistication, not just image quality.
PDF Support
PDF Support in Claude:
Claude can read PDF files directly using similar code to image processing.
Key implementation changes:
- File type = "document" instead of "image"
- Media type = "application/pdf" instead of "image/png"
- Variable naming = file_bytes instead of image_bytes
Claude PDF capabilities = read text + images + charts + tables + mixed content extraction
PDF processing = one-stop solution for comprehensive document analysis
Usage pattern = same as image input but with document-specific parameters
Citations
Citations = feature allowing Claude to reference source documents and show where information comes from
Citation types:
- citation_page_location = for PDF documents, shows document index/title/start page/end page/cited text
- citation_char_location = for plain text, shows character position in text block
Implementation:
- Add "citations": "enabled": true to request
- Add "title" field to identify source document
- Works with both PDF files and plain text sources
Response structure = content becomes list of text blocks, some containing citations arrays with location data
Purpose = transparency for users to verify Claude's information sources and check accuracy of interpretations
UI benefit = enables citation popups/overlays showing source document, page numbers, and exact cited text when users hover over referenced content
Key use case = ensuring users can investigate how Claude builds responses from source materials rather than appearing to speak from memory alone
İstem Önbelleğe Alma
İstem Önbelleğe Alma = feature that speeds up Claude's responses and reduces text generation costs by reusing computational work from previous requests.
Normal request flow: User sends message → Claude processes input (creates internal data structures, performs calculations) → Claude generates output → Claude discards all processing work → Ready for next request.
Problem: When follow-up requests contain identical input messages, Claude must repeat all the same computational work it just threw away, creating inefficiency.
Solution: Prompt caching stores the results of input message processing in temporary cache instead of discarding. When identical input appears in subsequent requests, Claude retrieves cached work rather than reprocessing, dramatically speeding response generation.
Key benefit: Reuses previous computational work to avoid redundant processing of repeated content.
Rules of İstem Önbelleğe Alma
İstem Önbelleğe Alma = system that saves processing work from initial request to reuse in follow-up requests with identical content
Core mechanism: Initial request → Claude processes + saves work to cache → Follow-up requests with identical content → Claude retrieves cached work instead of reprocessing
Cache duration = 1 hour maximum
Cache activation requires manual cache breakpoint addition to message blocks
Text block formats:
- Shorthand: content = "text string" (cannot add cache control)
- Longhand: content = ["type": "text", "text": "content", "cache_control": ...] (required for caching)
Cache scope = all content up to and including breakpoint gets cached
Cache invalidation = any change in content before breakpoint invalidates entire cache
Content processing order = tools → system prompt → messages (joined together)
Cache breakpoint placement options:
- Tool schemas
- System prompts
- Message blocks (text, image, tool use, tool result)
Maximum breakpoints = 4 per request
Multiple breakpoints = create multiple cache layers, partial cache hits possible if only later content changes
Minimum cache threshold = 1024 tokens required for content to be cached
Best use cases = repeated identical content (system prompts, tool definitions, static message prefixes)
İstem Önbelleğe Alma in Action
İstem Önbelleğe Alma Implementation = automatically caches tool schemas and system prompts to reduce token usage
Setup = modify chat function to enable caching by default for tools and system prompts
Tool Schema Caching = add cache_control field with type "ephemeral" to last tool in list. Best practice: create copy of tools list, clone last tool schema, add cache control, then overwrite to avoid modifying original schemas
System İstem Önbelleğe Alma = wrap system prompt in text block dictionary with cache_control type "ephemeral"
Multiple Cache Breakpoints = can set cache points for both tools and system prompt in single request
Cache Order = tools → system prompt → messages
Token Usage Patterns:
- cache_creation_input_tokens = tokens written to cache on first use
- cache_read_input_tokens = tokens retrieved from cache on subsequent identical requests
- Partial cache reads possible when some content matches cached data
Cache Invalidation = any change to cached content (tools or system prompt) invalidates cache, forces new cache creation
Use Cases = identical content across requests - same tool schemas, system prompts, or message sequences
Code Execution and the Files API
Files API = allows uploading files ahead of time and referencing them later via file ID instead of including raw file data in each request. Upload file → get file metadata object with ID → use ID in future requests.
Code Execution = server-based tool where Claude executes Python code in isolated Docker containers. No implementation needed, just include predefined tool schema. Claude can run code multiple times, interpret results, generate final response.
Key constraints: Docker containers have no network access. Data input/output relies on Files API integration.
Combined workflow: Upload file via Files API → get file ID → include ID in container upload block → ask Claude to analyze → Claude writes/executes code with access to uploaded file → returns analysis and results.
Claude can generate files (plots, reports) inside container that can be downloaded using file IDs returned in response.
Use cases: Data analysis, file processing, automated code generation for complex tasks. Response contains code blocks, execution results, and final analysis.
Implementation: Use container upload block with file ID, include analysis prompt, Claude handles code execution automatically.
8. MCP (Model Context Protocol)
MCP client/server mimarisi, tool/resource/prompt tanımları ve server inspector ile geliştirme.
Introducing MCP
MCP = Model Context Protocol, communication layer providing Claude with context and tools without requiring developers to write tedious code.
Architecture: MCP client connects to MCP server. Server contains tools, resources, and prompts as internal components.
Problem solved: Eliminates burden of authoring/maintaining numerous tool schemas and functions for service integrations. Example: GitHub chatbot would require implementing tools for repositories, pull requests, issues, projects - significant developer effort.
Solution: MCP server handles tool definition and execution instead of your application server. MCP servers = interfaces to outside services, wrapping functionality into ready-to-use tools.
Key benefits: Developers avoid writing tool schemas and function implementations themselves.
Common questions:
- Who creates MCP servers? Anyone, often service providers make official implementations (AWS, etc.)
- vs direct API calls? MCP eliminates need to author tool schemas/functions yourself
- vs tool use? MCP and tool use are complementary - MCP handles WHO does the work (server vs developer), both still involve tools
Core value: Shifts integration burden from application developers to MCP server maintainers.
MCP Clients
MCP Client = communication interface between your server and MCP server, provides access to server's tools
Transport agnostic = client/server can communicate via multiple protocols (stdio, HTTP, WebSockets)
Common setup = client and server on same machine using standard input/output
Communication = message exchange defined by MCP spec
Key message types:
- list tools request = client asks server for available tools
- list tools result = server responds with tool list
- call tool request = client asks server to run tool with arguments
- call tool result = server responds with tool execution result
Typical flow:
1. User queries server
2. Server requests tool list from MCP client
3. MCP client sends list tools request to MCP server
4. MCP server responds with list tools result
5. Server sends query + tools to Claude
6. Claude requests tool execution
7. Server asks MCP client to run tool
8. MCP client sends call tool request to MCP server
9. MCP server executes tool (e.g. GitHub API call)
10. Results flow back through chain: MCP server → MCP client → server → Claude → user
Purpose = enables servers to delegate tool execution to specialized MCP servers while maintaining Claude integration
Project Setup
CLI-based chatbot project = teaches MCP client-server interaction through hands-on implementation
Project components:
- MCP client = connects to custom MCP server
- MCP server = provides 2 tools (read document, update document)
- Document collection = fake documents stored in memory only
Key distinction: Normal projects implement either client OR server, not both. This project implements both for educational purposes.
Setup process:
1. Download CLI_project.zip starter code
2. Extract and open in code editor
3. Follow readme.md setup directions
4. Add API key to .env file
5. Install dependencies (with/without UV)
6. Run project: "uv run main.py" or "python main.py"
7. Test with chat prompt
Expected outcome = working chat interface that responds to basic queries, ready for MCP feature additions.
Defining Tools with MCP
MCP server implementation using Python SDK creates tools through decorators rather than manual JSON schemas.
MCP Python SDK = Official package that auto-generates tool JSON schemas from Python function definitions using @mcp.tool decorator.
Tool definition syntax = @mcp.tool(name="tool_name", description="description") + function with typed parameters using Field() for argument descriptions.
Two tools implemented:
1. read_doc_contents = Takes doc_id string, returns document content from in-memory docs dictionary
2. edit_document = Takes doc_id, old_string, new_string parameters, performs find/replace on document content
Error handling = Check if doc_id exists in docs dictionary, raise ValueError if not found.
Key advantage = SDK eliminates manual JSON schema writing, generates schemas automatically from Python function signatures and decorators.
Required imports = Field from pydantic for parameter descriptions, mcp package for server and tool decorators.
Implementation pattern = Decorator defines tool metadata, function parameters define tool arguments with types and descriptions, function body contains tool logic.
The Server Inspector
MCP Inspector = in-browser debugger for testing MCP servers without connecting to applications
Access: Run `mcp dev [server_file.py]` in terminal → opens server on port → navigate to provided URL in browser
Interface: Left sidebar has connect button → top menu shows resources/prompts/tools sections → tools section lists available tools → click tool to open right panel for manual testing
Testing workflow: Connect to server → navigate to tools → select specific tool → input required parameters → click run tool → verify output
Key features: Live development testing, manual tool invocation, parameter input forms, success/failure feedback, no need for full application integration
Note: UI actively changing during development, core functionality remains similar
Example usage: Test document tools by inputting document IDs, verify read operations, test edit operations, chain operations to verify changes
Primary benefit: Debug MCP server implementations efficiently during development phase
Implementing a Client
MCP Client Implementation:
MCP Client = wrapper class around client session for resource cleanup and connection management to MCP server
Client Session = actual connection to MCP server from MCP Python SDK, requires resource cleanup on close
Client Purpose = exposes MCP server functionality to rest of codebase, enables reaching out to server for tool lists and tool execution
Key Functions:
- list_tools() = await self.session.list_tools(), return result.tools
- call_tool() = await self.session.call_tool(tool_name, tool_input)
Usage Flow = client gets tool definitions to send to Claude, then executes tools when Claude requests them
Common Pattern = wrap client session in larger class for resource management rather than use session directly
Testing = can run client file directly with testing harness to verify server connection and tool retrieval
Integration = other code in project calls client functions to interact with MCP server, enabling Claude to inspect/edit documents through defined tools
Defining Resources
MCP Resources = mechanism allowing MCP servers to expose data to clients for read operations
Resource Types = 2 types: direct (static URI like "docs://documents") and templated (parameterized URI like "docs://documents/doc_id")
URI = address/identifier for accessing specific resource, defined when creating resource
Resource Flow = client sends read resource request with URI → server matches URI to function → server executes function → returns data in read resource result
Implementation = use @mcp.resource decorator with URI and MIME type parameters
MIME Types = hint to client about returned data format (application/json for structured data, text/plain for plain text)
Templated Resources = URI parameters automatically parsed by SDK and passed as keyword arguments to handler function
Resource vs Tools = resources provide data proactively (fetch document contents when @ mentioned), tools perform actions reactively (when Claude decides to call them)
Data Return = SDK automatically serializes returned data to strings, client responsible for deserialization
Testing = MCP inspector can list direct resources separately from templated resources, allows testing individual resource calls
Accessing Resources
MCP Resource Access Implementation:
Resource Reading Function = client-side function to request and parse resources from MCP server
Function Parameters = URI (resource identifier)
Implementation Steps:
- Import json module + AnyURL from pydantic
- Call await self.session.read_resource(AnyURL(uri))
- Extract first element from result.contents[0]
- Check resource.mime_type for parsing strategy
Content Parsing Logic:
- If mime_type == "application/json" → return json.loads(resource.text)
- Otherwise → return resource.text (plain text)
Server Response Structure = result.contents list with first element containing type/mime_type metadata
Resource Integration = MCP client functions called by other application components to fetch document contents for prompts
End Result = Document contents automatically included in Claude prompts without requiring tool calls
Key Point = Resources expose server information directly to clients through structured request/response pattern
Defining Prompts
MCP Prompts = Pre-defined, tested prompt templates that MCP servers expose to client applications for specialized tasks.
Purpose = Instead of users writing ad-hoc prompts, server authors create high-quality, evaluated prompts tailored to their server's domain.
Implementation = Use @mcpserver.prompt decorator with name/description, define function that returns list of messages (user/assistant messages that can be sent directly to Claude).
Example Use Case = Document formatting prompt that takes document ID, instructs Claude to read document using tools, reformat to markdown, and save changes.
Key Benefits = Server-specific expertise, pre-tested quality, reusable across client applications, better results than user-generated prompts.
Message Structure = Returns base.UserMessage objects containing the formatted prompt text with interpolated parameters.
Client Integration = Prompts appear as autocomplete options (slash commands) in client applications, prompt user for required parameters, then execute the pre-built prompt workflow.
Prompts in the Client
MCP Client Prompt Implementation:
List prompts = await self.session.list_prompts(), return result.prompts
Get prompt = await self.session.get_prompt(prompt_name, arguments), return result.messages
Prompt workflow:
1. Define prompt in MCP server with expected arguments (e.g., document_id)
2. Client calls get_prompt with prompt name + arguments dictionary
3. Arguments passed as keyword arguments to prompt function
4. Function interpolates arguments into prompt text
5. Returns messages array for direct feeding to LLM
Key concept: Prompts are server-defined templates that clients can invoke with specific arguments to generate contextualized instructions for LLMs. Arguments flow from client call → prompt function → interpolated prompt text → LLM consumption.
9. Claude Code & Anthropic Uygulamaları
Claude Code kurulumu ve kullanımı, MCP sunucularıyla genişletme, paralelleştirme ve otomatik debug süreçleri.
Anthropic Apps
Anthropic Apps = two deployed applications by Anthropic: Claude Code and Computer Use.
Claude Code = terminal-based coding assistant that serves as example of agent architecture.
Computer Use = toolset that expands Claude's capabilities beyond text generation.
Key purpose = these apps demonstrate agent concepts and provide practical examples for understanding agent design and implementation.
Setup process = involves terminal configuration for Claude Code usage on sample projects.
Agent connection = both applications exemplify how agents work, serving as learning models for building effective agents.
Claude Code Setup
Claude Code = terminal-based coding assistant program that helps with code-related tasks
Core capabilities = search/read/edit files + advanced tools (web fetching, terminal access) + MCP client support for expanded functionality via MCP servers
Setup process:
1. Install Node.js (check with "npm help" command)
2. Run npm install to install Claude Code
3. Execute "claude" command in terminal to login to Anthropic account
Full setup guide = docs.anthropic.com
MCP client functionality = can consume tools from MCP servers to extend capabilities beyond basic file operations
Claude Code in Action
Claude Code = AI coding assistant that functions as a collaborative engineer on projects, not just a code generator.
Key capabilities: project setup, feature design, code writing, testing, deployment, error fixing in production.
Setup workflow:
- Download project, open in editor
- Run `claude` command to launch
- Ask Claude to read README and execute setup directions
- Run `init` command = Claude scans codebase for architecture/coding style, creates claude.md file
- claude.md = automatically included context for future requests
Memory types: Project (shared), Local, User memory files.
Context management:
- Use # symbol to add specific notes to memory
- Can manually edit claude.md or rerun init to update
- Claude can handle Git operations (staging, committing)
Effective prompting strategies:
Method 1 - Three-step workflow:
1. Identify relevant files, ask Claude to analyze them
2. Describe feature, ask Claude to plan solution (no code yet)
3. Ask Claude to implement the plan
Method 2 - Test-driven development:
1. Provide relevant context
2. Ask Claude to suggest tests for the feature
3. Select and implement chosen tests
4. Ask Claude to write code until tests pass
Core principle: Claude Code = effort multiplier. More detailed instructions = significantly better results. Treat as collaborative engineer, not just code generator.
Enhancements with MCP Servers
Claude Code = AI assistant with embedded MCP (Model Context Protocol) client that can connect to MCP servers to expand functionality.
MCP Server Integration = Connect external tools/services to Claude Code via command: `claude mcp add [server-name] [startup-command]`
Example Implementation = Document processing server exposing "Document Path to Markdown" tool, allowing Claude Code to read PDF/Word documents by running `uv run main.py`
Dynamic Capability Expansion = MCP servers add new functions to Claude Code in real-time without core modifications.
Common Use Cases = Production monitoring (Sentry), project management (Jira), communication (Slack), custom development workflow tools.
Key Benefit = Significant flexibility increase for development workflows through modular server connections.
Setup Process = 1) Create MCP server with tools, 2) Add server to Claude Code with name and startup command, 3) Restart Claude Code to access new capabilities.
Parallelizing Claude Code
Parallelizing Claude Code = running multiple Claude instances simultaneously to complete different tasks in parallel
Core Problem = multiple Claude instances modifying same files simultaneously creates conflicts and invalid code
Solution = Git work trees providing isolated workspaces per Claude instance
Git Work Trees = feature creating complete project copies in separate directories, each corresponding to different Git branches
Workflow = create work tree → assign task to Claude instance → work in isolation → commit changes → merge back to main branch
Custom Commands = automating work tree creation/management through .claude/commands directory with markdown files containing prompts
Command Structure = .claude/commands/filename.md with $ARGUMENTS placeholder for dynamic values
Parallel Execution Benefits = single developer commanding virtual team of software engineers, major productivity scaling limited only by engineer's management capacity
Merge Conflicts = Claude automatically resolves conflicts during branch merging process
Cleanup = Claude handles work tree removal after feature completion
Key Advantage = scales to unlimited parallel instances based on developer's capacity to manage simultaneous tasks
Automated Debugging
Automated Debugging = using AI (Claude) to automatically detect, analyze, and fix production errors without manual intervention.
Core Workflow:
1. GitHub Action runs daily to check production environment
2. Fetches CloudWatch logs from last 24 hours
3. Claude identifies errors, deduplicates them
4. Claude analyzes each error and generates fixes
5. Creates pull request with proposed solutions
Key Components:
- GitHub Actions for scheduling/automation
- AWS CLI for log retrieval
- Claude Code for error analysis and code fixes
- CloudWatch for production error monitoring
Benefits:
- Catches production-only errors (issues not present in development)
- Reduces manual log hunting and debugging time
- Provides context-aware fixes with explanations
- Creates reviewable pull requests for changes
Common Use Case: Configuration errors between environments (invalid model IDs, API keys, etc. that work locally but fail in production)
Implementation Requirements: Repository access, cloud logging service, AI coding assistant, CI/CD pipeline integration.
10. Computer Use
Claude'un ekran görüntüsü alıp fare/klavye kontrolü ile bilgisayarı kullanabilme özelliği.
Computer Use
Computer Use = Claude's ability to interact with computer interfaces through visual observation and control actions.
Key capabilities:
- Takes screenshots of applications/browsers
- Clicks buttons, types text, navigates interfaces
- Follows multi-step instructions autonomously
- Performs QA testing and automation tasks
How it works:
- Runs in isolated Docker container environment
- User provides instructions via chat interface
- Claude observes screen visually and executes actions
- Generates reports on task completion/results
Primary use cases:
- Automated QA testing of web applications
- UI interaction testing across different scenarios
- Time-saving for repetitive computer tasks
- Bug identification through systematic testing
Setup requirement = Reference implementation available for local testing
Example workflow: User describes testing requirements → Claude navigates to application → Executes test cases → Reports pass/fail results with detailed findings
How Computer Use Works
Computer use = tool system implementation allowing Claude to interact with computing environments
Tool use akışı: User sends message + tool schema → Claude responds with tool use request (ID, name, input) → Server executes code → Result sent back to Claude as tool result
Computer use follows identical flow:
- Special tool schema sent to Claude (small schema expands to larger structure behind scenes)
- Expanded schema includes action function with arguments: mouse move, left click, screenshot, etc.
- Claude sends tool use request
- Developers must fulfill request via computing environment (typically Docker container)
- Container executes programmatic key presses/mouse movements
- Response sent back to Claude
Key points:
- Claude doesn't directly manipulate computers
- Computer use = tool system + developer-provided computing environment
- Anthropic provides reference implementation (Docker container with pre-built mouse/keyboard execution code)
- Setup requires Docker + simple command execution
- Enables direct chat interface for testing Claude's computer use functionality
Computer use = abstraction layer where tool system handles Claude communication while Docker container handles actual computer interactions.
11. Agents & Workflows
Ajan ve workflow ayrımı, paralelleştirme, chaining ve routing pattern'leri, ajanların araç kullanımı.
Agents and Workflows
Workflows and agents = strategies for handling user tasks that can't be completed by Claude in a single request.
Decision rule: Use workflows when you have precise task understanding and know exact steps sequence. Use agents when task details are unclear.
Workflow = series of calls to Claude for specific problems where steps are predetermined.
Example workflow: Image to 3D model converter
- Step 1: Claude describes uploaded image in detail
- Step 2: Claude uses CADQuery Python library to model object from description
- Step 3: Create rendering of model
- Step 4: Claude compares rendering to original image
- Step 5: If inaccurate, repeat from step 2 with feedback
This follows evaluator-optimizer pattern:
- Producer = generates output (Claude + CADQuery modeling)
- Evaluator = assesses output quality (comparison step)
- Loop continues until evaluator accepts output
Key point: Workflows are implementation patterns that other engineers have successfully used. Identifying workflow patterns doesn't automatically implement them - you still need to write the actual code.
Parallelization Workflows
Parallelization Workflows = breaking one complex task into multiple simultaneous subtasks, then aggregating results.
Example: Material selection for parts
- Instead of: One large prompt asking Claude to choose between metal/polymer/ceramic/composite with all criteria
- Use: Separate parallel requests, each evaluating one material's suitability, then final aggregation step to compare results
Structure: Input → Multiple parallel subtasks → Aggregator → Final output
Benefits:
- Focus = Each subtask handles one specific analysis instead of juggling multiple considerations
- Modularity = Individual prompts can be improved/evaluated separately
- Scalability = Easy to add new subtasks without affecting existing ones
- Quality = Reduces confusion from overly complex single prompts
Key principle: Decompose complex decisions into specialized parallel analyses, then synthesize results.
Chaining Workflows
Chaining Workflows = breaking large tasks into series of distinct sequential steps rather than single complex prompt
Core concept: Instead of one massive prompt with multiple requirements, split into separate calls where each focuses on one specific subtask.
Example workflow: User enters topic → search trending topics → Claude selects most interesting → Claude researches topic → Claude writes script → generate video → post to social media
Key benefit: Allows AI to focus on individual tasks rather than juggling multiple constraints simultaneously
Primary use case: When Claude consistently ignores constraints in complex prompts despite repetition. Common with long prompts containing many "don't do X" requirements.
Problem scenario: Long prompt with constraints (don't mention AI, no emojis, professional tone) → Claude violates some constraints regardless of repetition
Solution: Step 1 - Send initial prompt, accept imperfect output. Step 2 - Follow-up prompt asking Claude to rewrite based on specific violations found.
Critical insight: Even simple-seeming workflow becomes essential when dealing with constraint-heavy prompts that AI struggles to follow completely in single pass.
Routing Workflows
Routing Workflows = workflow pattern that categorizes user input to determine appropriate processing pipeline
Key mechanism: Initial request to Claude categorizes user input into predefined genres/categories. Based on categorization response, system routes to specialized processing pipeline with customized prompts/tools.
Example flow:
1. User enters topic (e.g., "Python functions")
2. Claude categorizes topic (e.g., "educational")
3. System uses educational-specific prompt template
4. Claude generates script with educational tone/structure
Benefits: Ensures output matches topic nature. Programming topics get educational treatment with definitions/explanations. Entertainment topics get trendy language/engaging hooks.
Structure: One routing step → Multiple specialized processing pipelines → Each pipeline has customized prompts/tools for specific category
Use case: Social media video script generation where different topics require different tones and approaches.
Agents and Tools
Agents = AI systems that create plans to complete tasks using provided tools, effective when exact steps are unknown. Workflows = better when precise steps are known.
Key differences: Workflows require predetermined steps, agents dynamically plan using available tools.
Agent advantages: Flexibility to solve variety of tasks with same toolset, can combine tools in unexpected ways.
Tool abstraction principle: Provide generic/abstract tools rather than hyper-specialized ones. Example - Claude code uses bash, web_fetch, file_write (abstract) rather than refactor_tool, install_dependencies (specialized).
Tool combination examples: get_current_datetime + add_duration + set_reminder can solve various time-related tasks through different combinations.
Agent behavior: Can request additional information when needed, combines tools creatively to achieve goals, works best with small set of flexible tools.
Design approach: Give agent abstract tools that can be pieced together rather than single-purpose specialized tools. This enables dynamic problem-solving and unexpected use cases.
Environment Inspection
Environment Inspection = agents evaluating their environment and action results to understand progress and handle errors.
Core concept: After each action, agents need feedback mechanisms beyond basic tool returns to understand new environment state.
Computer use example: Claude takes screenshot after every action (typing, clicking) to see how environment changed, since it cannot predict exact results of actions like button clicks.
Code editing example: Before modifying files, agents must read current file contents to understand existing state.
Social media video agent applications:
- Use Whisper CPP via bash to generate timestamped captions, verify dialogue placement
- Use FFmpeg to extract video screenshots at intervals, inspect visual results
- Validate video creation meets expectations before posting
Key benefit: Environment inspection enables agents to gauge task progress, detect errors, and adapt to unexpected results rather than operating blindly.
Workflows vs Agents
Workflows = pre-defined series of calls to Claude with known exact steps. Agents = flexible approach using basic tools that Claude combines to complete unknown tasks.
Key differences:
Task division: Workflows break big tasks into smaller, specific subtasks enabling higher focus and accuracy. Agents handle varied challenges creatively without predetermined steps.
Testing/evaluation: Workflows easier to test due to known execution sequence. Agents harder to test since execution path unpredictable.
User experience: Workflows require specific inputs. Agents create own inputs from user queries and can request additional input when needed.
Success rates: Workflows = higher task completion rates due to structured approach. Agents = lower completion rates due to delegated complexity.
Recommendation: Prioritize workflows for reliability. Use agents only when flexibility truly required. Users want 100% working products over fancy agents.
Core principle: Solve problems reliably first, innovation second.