Produktbild: Large Language Model-Based Solutions

Large Language Model-Based Solutions How to Deliver Value with Cost-Effective Generative AI Applications

Aus der Reihe Tech Today

Fr. 68.90

inkl. gesetzl. MwSt., Versandkostenfrei


Produktdetails

Einband

Taschenbuch

Erscheinungsdatum

29.04.2024

Verlag

Wiley

Seitenzahl

224

Maße (L/B/H)

23.5/18.9/1.4 cm

Gewicht

318 g

Sprache

Englisch

ISBN

978-1-394-24072-2

Produktdetails

Einband

Taschenbuch

Erscheinungsdatum

29.04.2024

Verlag

Wiley

Seitenzahl

224

Maße (L/B/H)

23.5/18.9/1.4 cm

Gewicht

318 g

Sprache

Englisch

ISBN

978-1-394-24072-2

Herstelleradresse

Libri GmbH
Europaallee 1
36244 Bad Hersfeld
DE

Email: gpsr@libri.de

Kundinnen und Kunden meinen

0 Bewertungen

Informationen zu Bewertungen

Zur Abgabe einer Bewertung ist eine Anmeldung im Konto notwendig. Die Authentizität der Bewertungen wird von uns nicht überprüft. Wir behalten uns vor, Bewertungstexte, die unseren Richtlinien widersprechen, entsprechend zu kürzen oder zu löschen.

Die Bewertungen sind nach Format, Anzahl Sterne und Datum sortiert.

Verfassen Sie die erste Bewertung zu diesem Artikel

Helfen Sie anderen Kund*innen durch Ihre Meinung

Kundinnen und Kunden meinen

0 Bewertungen filtern

Die Leseprobe wird geladen.
  • Produktbild: Large Language Model-Based Solutions
  • Introduction xix

    Chapter 1: Introduction 1

    Overview of GenAI Applications and Large Language Models 1

    The Rise of Large Language Models 1

    Neural Networks, Transformers, and Beyond 2

    GenAI vs. LLMs: What's the Difference? 5

    The Three-Layer GenAI Application Stack 6

    The Infrastructure Layer 6

    The Model Layer 7

    The Application Layer 8

    Paths to Productionizing GenAI Applications 9

    Sample LLM-Powered Chat Application 11

    The Importance of Cost Optimization 12

    Cost Assessment of the Model Inference Component 12

    Cost Assessment of the Vector Database Component 19

    Benchmarking Setup and Results 20

    Other Factors to Consider 23

    Cost Assessment of the Large Language Model Component 24

    Summary 27

    Chapter 2: Tuning Techniques for Cost Optimization 29

    Fine-Tuning and Customizability 29

    Basic Scaling Laws You Should Know 30

    Parameter-Efficient Fine-Tuning Methods 32

    Adapters Under the Hood 33

    Prompt Tuning 34

    Prefix Tuning 36

    P-tuning 39

    IA3 40

    Low-Rank Adaptation 44

    Cost and Performance Implications of PEFT Methods 46

    Summary 48

    Chapter 3: Inference Techniques for Cost Optimization 49

    Introduction to Inference Techniques 49

    Prompt Engineering 50

    Impact of Prompt Engineering on Cost 50

    Estimating Costs for Other Models 52

    Clear and Direct Prompts 53

    Adding Qualifying Words for Brief Responses 53

    Breaking Down the Request 54

    Example of Using Claude for PII Removal 55

    Conclusion 59

    Providing Context 59

    Examples of Providing Context 60

    RAG and Long Context Models 60

    Recent Work Comparing RAG with Long Content Models 61

    Conclusion 62

    Context and Model Limitations 62

    Indicating a Desired Format 63

    Example of Formatted Extraction with Claude 63

    Trade-Off Between Verbosity and Clarity 66

    Caching with Vector Stores 66

    What Is a Vector Store? 66

    How to Implement Caching Using Vector Stores 66

    Conclusion 69

    Chains for Long Documents 69

    What Is Chaining? 69

    Implementing Chains 69

    Example Use Case 70

    Common Components 70

    Tools That Implement Chains 72

    Comparing Results 76

    Conclusion 76

    Summarization 77

    Summarization in the Context of Cost and Performance 77

    Efficiency in Data Processing 77

    Cost-Effective Storage 77

    Enhanced Downstream Applications 77

    Improved Cache Utilization 77

    Summarization as a Preprocessing Step 77

    Enhanced User Experience 77

    Conclusion 77

    Batch Prompting for Efficient Inference 78

    Batch Inference 78

    Experimental Results 80

    Using the accelerate Library 81

    Using the DeepSpeed Library 81

    Batch Prompting 82

    Example of Using Batch Prompting 83

    Model Optimization Methods 83

    Quantization 83

    Code Example 84

    Recent Advancements: GPTQ 85

    Parameter-Efficient Fine-Tuning Methods 85

    Recap of PEFT Methods 85

    Code Example 86

    Cost and Performance Implications 87

    Summary 88

    References 88

    Chapter 4: Model Selection and Alternatives 89

    Introduction to Model Selection 89

    Motivating Example: The Tale of Two Models 89

    The Role of Compact and Nimble Models 90

    Examples of Successful Smaller Models 91

    Quantization for Powerful but Smaller Models 91

    Text Generation with Mistral 7B 93

    Zephyr 7B and Aligned Smaller Models 94

    CogVLM for Language-Vision Multimodality 95

    Prometheus for Fine-Grained Text Evaluation 96

    Orca 2 and Teaching Smaller Models to Reason 98

    Breaking Traditional Scaling Laws with Gemini and Phi 99

    Phi 1, 1.5, and 2 B Models 100

    Gemini Models 102

    Domain-Specific Models 104

    Step 1 - Training Your Own Tokenizer 105

    Step 2 - Training Your Own Domain-Specific Model 107

    More References for Fine-Tuning 114

    Evaluating Domain-Specific Models vs. Generic Models 115

    The Power of Prompting with General-Purpose Models 120

    Summary 122

    Chapter 5: Infrastructure and Deployment Tuning Strategies 123

    Introduction to Tuning Strategies 123

    Hardware Utilization and Batch Tuning 124

    Memory Occupancy 126

    Strategies to Fit Larger Models in Memory 128

    KV Caching 130

    PagedAttention 131

    How Does PagedAttention Work? 131

    Comparisons, Limitations, and Cost Considerations 131

    AlphaServe 133

    How Does AlphaServe Work? 133

    Impact of Batching 134

    Cost and Performance Considerations 134

    S3: Scheduling Sequences with Speculation 134

    How Does S3 Work? 135

    Performance and Cost 135

    Streaming LLMs with Attention Sinks 136

    Fixed to Sliding Window Attention 137

    Extending the Context Length 137

    Working with Infinite Length Context 137

    How Does StreamingLLM Work? 138

    Performance and Results 139

    Cost Considerations 139

    Batch Size Tuning 140

    Frameworks for Deployment Configuration Testing 141

    Cloud-Native Inference Frameworks 142

    Deep Dive into Serving Stack Choices 142

    Batching Options 143

    Options in DJL Serving 144

    High-Level Guidance for Selecting Serving Parameters 146

    Automatically Finding Good Inference Configurations 146

    Creating a Generic Template 148

    Defining a HPO Space 149

    Searching the Space for Optimal Configurations 151

    Results of Inference HPO 153

    Inference Acceleration Tools 155

    TensorRT and GPU Acceleration Tools 156

    CPU Acceleration Tools 156

    Monitoring and Observability 157

    LLMOps and Monitoring 157

    Why Is Monitoring Important for LLMs? 159

    Monitoring and Updating Guardrails 160

    Summary 161

    Conclusion 163

    Index 181