Langfuse Launch Week #2

Langfuse, the open-source LLM engineering platform, is excited to announce its second Launch Week, starting on Monday, November 18, 2024. This week-long event will feature daily platform updates, culminating in a Product Hunt launch on Friday and a Virtual Town Hall on Wednesday.

Focus of Launch Week

Langfuse's second Launch Week is all about supporting the next generation of AI models and integrating the platform more deeply into developer workflows. The updates aim to deliver end-to-end prompt engineering tools specifically designed for product teams, enhancing the robustness and versatility of AI applications.

🔻 Day 0: Prompt Management for Vercel AI SDK

On the first day, Langfuse introduced native integration of its Prompt Management with the Vercel AI SDK. This integration enables developers to:

Version and release prompts directly in Langfuse.
Utilize prompts via the Vercel AI SDK.
Seamlessly monitor metrics like latency, costs, and usage.

This update answers critical questions for developers:

Which prompt version caused a specific bug?
What’s the cost and latency impact of each prompt version?
Which prompt versions are most used?

🆚 Day 1: Dataset Experiment Run Comparison View

The second day brought a new comparison view for dataset experiment runs within Langfuse Datasets. This powerful feature allows teams to:

Analyze multiple experiment runs side-by-side.
Compare application performance across test dataset experiments.
Explore metrics like latency and costs.
Drill down into individual dataset items.

This enhancement is particularly valuable for testing different prompts, models, or application configurations, making it a must-have tool for teams working on AI-powered products.

⚖️ Day 2: LLM-as-a-Judge Evaluations for Datasets

Day 2 of Launch Week 2 brings managed LLM-as-a-judge evaluators to dataset experiments. Assign evaluators to your datasets and they will automatically run on new experiment runs, scoring your outputs based on your evaluation criteria.

You can run any LLM-as-a-judge prompt, Langfuse comes with templates for the following evaluation criteria: Hallucination, Helpfulness, Relevance, Toxicity, Correctness, Contextrelevance, Contextcorrectness, Conciseness.

Langfuse LLM-as-a-judge works with any LLM that supports tool/function calling that is accessible via the following APIs: OpenAI, Azure OpenAI, Anthropic, AWS Bedrock. Via LLM gateways such as LiteLLM, virtually any popular LLM can be used via the OpenAI connector.

🎨 Day 3: Full multi-modal support, including audio, images, and attachments

We're excited that Langfuse now offers full multi-modal support, including images, audio files, and attachments! This highly requested feature allows you to integrate media such as images (PNG, JPG, WEBP), audio files (MPEG, MP3, WAV), and documents (PDF, plain text) directly into your traces, enhancing your development and monitoring workflow in Langfuse.

Getting started is easy—simply upgrade to the latest version of the Langfuse SDK. Our SDKs now automatically handle base64 encoded media, extracting and uploading them independently while referencing them in your traces. For more control or different media types, you can use the new LangfuseMediaclass to wrap your media before inclusion.

📚 Day 4: All new Datasets and Evaluations documentation

Today we're highlighting documentation - an often overlooked but critical element of great Developer Experience. Alongside major updates to our Datasets and Evaluations features, we've completely rebuilt their documentation to be more thorough and user-friendly than ever before. The new docs better explain how and when to use these features, introduce core data models, and provide end-to-end examples as Jupyter Notebooks. We've also revamped the /docs start page to reflect Langfuse's comprehensive platform scope, and added llms.txt for better LLM tool integration. Documentation is product at Langfuse - we take it seriously and have built many features to help users get the most value from it.

See the changelog for more details. It also includes a summary of all the features we added to the documentation over the last year to make it truly awesome.

🧪 Day 5: Prompt Experiments

Prompt Experiments are the final piece of the launch week theme of "closing the development loop". They allow you to test prompt versions from Langfuse Prompt Management on datasets of test inputs and expected outputs. You can optionally use LLM-as-a-Judge evaluators to automatically evaluate responses based on expected outputs, and compare results in the new side-by-side experiment comparison view. This powerful combination speeds up the feedback loop when working on prompts and prevents regressions when making rapid prompt changes

See the changelog for more details or watch the video above for a walkthrough.

🍒 Extra Goodies

List of additional features that were released this week:

llms.txt: Easily use the Langfuse documentation in Cursor and other LLM editors via the new llms.txt file.
/docs: New documentation start page with a simplified overview of all Langfuse features.
Self-hosted Pro Plan: Get access to additional features without the need for a sales call or enterprise pricing. All core Langfuse features are OSS without limitations, see comparison for more details.
Developer Preview of v3 (self-hosted): v3 is the biggest release in Langfuse history. After running large parts of it on Langfuse Cloud for a while, an initial developer preview for self-hosted users is now available.

Stay Updated

Stay connected with Langfuse during Launch Week:

🌟 Star the project on GitHub to show your support.
Follow Langfuse on Twitter and LinkedIn for updates.
Subscribe to the Langfuse mailing list to receive daily updates throughout the week.

Learn more: Langfuse Blog

Blog