Don't Fear AI
Posts
Is the new Open GPT 4.5 just vibes

Is the new Open GPT 4.5 just vibes

Is the new Open GPT 4.5 just vibes? How to Use LLMs; Building AI Interfaces for the Future; First LLM for Text-To-Speech

John Robert
March 01, 2025

What we have for you today

Is the new Open GPT 4.5 just vibes?
How to Use LLMs
Building AI Interfaces for the Future
First LLM for Text-To-Speech

Is the new Open GPT 4.5 just vibes?

OpenAI has launched GPT-4.5, its latest and largest AI language model, initially available as a research preview for ChatGPT Pro users. While OpenAI describes it as its “most knowledgeable model yet,” it clarifies that GPT-4.5 is not a frontier model and may not outperform o1 or o3-mini.

GPT-4.5 offers better writing, improved world knowledge, and a refined personality, making interactions feel more natural. It enhances pattern recognition and problem-solving but does not introduce enough new capabilities to qualify as a frontier model. Leaked documents suggest it improves computational efficiency by 10x over GPT-4 but underperforms on some preparedness evaluations.

OpenAI reportedly trained GPT-4.5 with synthetic data and new supervision techniques, reducing hallucinations compared to GPT-4o and o1. Human testers rated it superior to GPT-4o in multiple areas. CEO Sam Altman acknowledged it as a “giant, expensive model” that won’t dominate benchmarks.

How to Use LLMs

This video provides a comprehensive overview of Large Language Models (LLMs), their features, and practical applications.

Key Takeaways:

LLM Landscape: Covers popular models like ChatGPT, Gemini, Claude, and Grok, along with performance tracking via leaderboards.
Interacting with LLMs: Explains text-based interactions, token limits, and context windows.
How LLMs Work: Details pre-training, post-training, and model persona shaping.
Practical Applications: Demonstrates LLM use for research, travel advice, and ideation.
Optimizing Performance: Tips on choosing the right model, when to start a new chat, and using reinforcement-tuned "thinking models" for complex tasks.
Enhanced Capabilities: Shows how tools like internet search, deep research, and file uploads improve accuracy and usability.
Code & AI Collaboration: Discusses coding with LLMs, including vibe coding and collaborative tools like Cursor and Composer.
Multimodal Interactions: Explores LLMs' capabilities with speech, images, video, and advanced voice modes.
Personalization & Customization: Highlights memory features, custom GPTs, and language learning tools.

The video equips viewers with the knowledge to effectively integrate LLMs into their workflows, optimizing both personal and professional use.

Building AI Interfaces for the Future

This video explores the evolving landscape of AI user interfaces, moving beyond traditional chat-based UIs to cutting-edge designs. Raphael Shad, the creator of Notion Calendar, joins the discussion to analyze AI interfaces submitted by the Y Combinator community.

Key Highlights:

Voice AI: A deep dive into Vapi (voice AI for developers) and Retail AI (voice AI for call operations), examining latency challenges and the importance of real-time adaptability.
AI Agents: A look at Gum Loop (AI automation) and Answer Grid (scalable AI-driven answers), showcasing the role of visual workflows and data validation.
AI Product Design: Insights into Polyat, an AI product designer, and the complexities of prompt-based design, including generating sophisticated outputs and improving AI feedback mechanisms.
Adaptive AI Interfaces: Discussion of Zi, a smarter email app, and how interfaces can dynamically adjust based on content.
AI Video Studio: An exploration of Argil, an AI-powered video studio, covering script customization, deepfake technology, and the balance between fidelity and speed.

The video underscores the limitless potential of AI-native design, emphasizing how these innovations are shaping the future of software. It raises a crucial question: How can we keep users in control while AI seamlessly enhances their experience? This discussion offers valuable insights into the next generation of AI-driven interfaces poised to evolve over the coming decade.

First LLM for Text-To-Speech

Octave is a new large language model (LLM) for text-to-speech (TTS) that generates expressive, context-aware speech. Unlike conventional TTS, it understands the meaning of words in context, adjusting tone, rhythm, and emotion to sound more natural and lifelike.

Key Features of Octave:

Context-Aware Speech: Adapts tone based on emotional cues, plot twists, and character traits, similar to a human actor.
Voice Design: Creates AI voices from prompts or scripts, allowing customization based on accents, demographics, and character traits.
Emotion & Style Control: Can modify speech to express emotions like anger, sarcasm, or calmness.
Voice Cloning (Upcoming): Will allow instant voice cloning from short audio samples.
Tools & Accessibility: Available on platform.hume.ai and via API, with Python and TypeScript SDKs for developers. Includes a voice library and a long-form content generation interface.

Performance & Evaluation:

In a blind study comparing Octave to ElevenLabs Voice Design, Octave was preferred in:

Audio quality (71.6%)
Naturalness (51.7%)
Matching voice descriptions (57.7%)

A public initiative, Expressive TTS Arena, is also being launched to evaluate expressive speech synthesis.

Future Development:

Hume AI aims to enhance Octave's multilingual capabilities, improve expressive speech generation, and support multi-speaker conversations.

Link to article