Did Meta lie about Llama 4? OpenAI new models

Today on Don’t Fear AI

  • OpenAI new models GPT 4.1, 4.1 mini, and 4.1 Nano

  • Is Agent2Agent Protocol better than MCP?

  • Llama 4 Maverick ranking dropped from #2 to #32 

  • Shopify CEO Manifesto Says AI Now Mandatory For All Employees

OpenAI new models GPT 4.1, 4.1 mini, and 4.1 Nano

OpenAI just released three new models: GPT‑4.1, 4.1 Mini, and 4.1 Nano. You can now run 1M-token context via API.

🔹 GPT-4.1 (Full Model)

  • Performance:

    • Coding: 54.6% on SWE-bench Verified — 21.4% better than GPT-4o, 26.6% better than GPT-4.5.

    • Instruction Following: 38.3% on Scale’s MultiChallenge — +10.5% over GPT-4o.

    • Multimodal Long-Context: 72.0% on Video-MME (long, no subtitles) — new state-of-the-art.

  • Context Window: Up to 1 million tokens.

  • Output Token Limit: Increased to 32,768 tokens (vs 16,384 in GPT-4o).

  • Strengths:

    • Better long-context comprehension.

    • Greatly improved frontend coding (80% human preference over GPT-4o).

    • Strong diff handling across formats for code editing.

    • Supports complex software engineering tasks.

  • Cost: $2M input / $8M output tokens.

  • Limitations:

    • Accuracy decreases with longer input (84% at 8k → 50% at 1M tokens).

    • Can be overly literal; benefits from more specific prompts.

🔹 GPT-4.1 Mini

  • Performance:

    • Matches or exceeds GPT-4o on many intelligence benchmarks.

    • Outperforms GPT-4o in several instruction-following and coding tasks.

  • Efficiency:

    • Latency: Nearly 50% lower than GPT-4o.

    • Cost: 83% cheaper than GPT-4o — $0.40M input / $1.60M output.

  • Ideal For:

    • Developers who need strong performance at lower latency and cost.

    • Lightweight agents with fast response needs.

🔹 GPT-4.1 Nano

  • Performance:

    • MMLU: 80.1%

    • GPQA: 50.3%

    • Aider polyglot coding: 9.8% — higher than GPT-4o mini.

  • Speed & Cost:

    • Fastest and cheapest model in OpenAI’s lineup.

    • Cost: $0.10M input / $0.40M output.

  • Context Window: Also supports 1 million tokens.

  • Use Cases:

    • Ideal for low-latency tasks: classification, autocompletion, fast agents.

    • Balanced performance at minimal cost.

🧠 Overall Enhancements Across GPT-4.1 Models

  • Significant gains in:

    • Real-world coding

    • Instruction-following

    • Long-context and multimodal comprehension

  • Tuned for agentic workflows: code editing, document insights, customer service automation.

  • Stronger adherence to format, tool use, and diff-based code generation.

  • All models trained with real-world developer feedback.

    Link to platform to compare models

Is Agent2Agent Protocol better than MCP

Agent2Agent (A2A) is a new open protocol developed by Google in collaboration with over 50 technology and service partners to enable AI agents to communicate, collaborate, and interoperate across different platforms, vendors, and ecosystems.

Key Goals and Benefits

  • Boost enterprise productivity by automating and coordinating complex tasks (e.g., supply chain, hiring).

  • Enable multi-agent systems where agents can work together securely, regardless of their origin.

  • Standardize communication among agents to reduce cost and increase efficiency.

Core Features of A2A

  • Agent Cards: Metadata in JSON describing an agent’s capabilities and endpoint.

  • Task-based Communication: Agents manage tasks through a defined lifecycle (submitted → working → completed).

  • Collaboration Tools: Support for messages, artifacts (outputs), real-time updates, and push notifications.

  • Modality Agnostic: Supports text, audio, video, and structured data.

  • Secure by Default: Uses enterprise-grade authentication.

  • Built on Standards: HTTP, SSE, JSON-RPC for easy IT integration.

  • Long-running Task Support: Real-time progress updates via streaming or notifications.

Llama 4 Maverick ranking dropped from #2 to #32 

Meta's new open-source language model, Llama-4-Maverick, has sparked backlash after it dropped from 2nd to 32nd place on the LMArena leaderboard. Developers suspect Meta manipulated the rankings by initially submitting a specially optimized version (Llama-4-Maverick-03-26-Experimental) rather than the general release model.

The public version (Llama-4-Maverick-17B-128E-Instruct), despite its size and specs, performed significantly worse — even trailing older or less powerful models like Llama-3.3-Nemotron-Super-49B-v1. Meta claimed the high-ranking version was “specifically optimized for dialogue,” but this has only fueled skepticism.

The release has been labeled disappointing and misleading, with some in the AI community expressing frustration and disillusionment. Critics argue that while Llama-4-Maverick might have been competitive months ago, it now lags behind open-source peers like Qwen and DeepSeek, and can’t compete on cost-efficiency with models like Gemini Flash.

Shopify CEO Manifesto Says AI Now Mandatory For All Employees

A leaked internal memo from Shopify CEO Tobias Lütke has gone viral for its bold and clear stance on AI adoption. Lütke declares that AI proficiency is now a mandatory expectation for all employees, not just technical staff. This shift is not seen as a temporary initiative but a core part of Shopify’s culture, operations, and performance evaluations.

Key takeaways from the memo:

  • AI use is mandatory, and employees must justify non-use.

  • AI must be part of all early-stage product development.

  • Performance reviews now include AI usage as a metric.

  • Stagnation is failure: Continuous AI learning is expected.

  • Leadership must model AI use and set the tone culturally.

This reflects a broader trend among CEOs like Jon Moeller (Procter & Gamble), Jane Fraser (Citigroup), and Chip Bergh (Levi’s), who are embedding AI deeply into operations to drive productivity, customer experience, and competitiveness. These leaders view AI not as a tool but as a strategic mindset and business imperative.