Rami's Readings #94 - 🤖 5 AI Predictions for 2025 ✨

5 AI predictions for 2025, the latest on AI, LLMs, DeepSeek, New Tools, Papers, VC, Hardware, and more.

Jan 26, 2025

Welcome to Rami’s Readings #94 - a weekly digest of interesting articles, papers, videos, and X threads from my various sources across the Internet. Expect a list of reads covering AI, technology, business, culture, fashion, travel, and more. Learn about what I do at ramisayar.com/about.

Happy New Year! I hope y’all had a fantastic start to the year! My year started with crazy travel, even by my standards. For the curious, I traveled to Boston, NYC, Montréal, Milan, Paris, Milan, NYC, Boston, Seattle, Boston and now back home in Redmond (Seattle).

AI started off with a‼️ banger‼️ thanks to DeepSeek. Happy Lunar New Year! Stay tuned as we dive into DeepSeek’s R1, and much more from the past few weeks later in the newsletter.

First, grab a mug ☕ I’d like to share 5 AI predictions I collected from builders who are shipping AI systems at scale in 2025.

🤖 5 AI Predictions for 2025 ✨

Priya Sundararaman

Priya is a Digital & AI leader with 20+ years of experience driving enterprise-scale innovation, including AI solutions for Amazon, Walmart, and State Farm and holds 13 AI patents.

In the last hundred years, humanity mastered the art of communicating with machines through code. Today, we're witnessing a remarkable reversal: machines are speaking the language of humans. This shift, driven by the rapid advancement of Generative AI creates a transformation in our relationship with AI systems.
The challenge now lies not in instructing AI, but in forging a symbiotic partnership with it. We must learn to collaborate with these intelligent systems, leveraging their strengths while preserving our unique human qualities and values.
I anticipate that 2025 will be a pivotal year in this journey. We will likely establish a comprehensive blueprint for what we call “human-AI interaction” (HAII). This blueprint will address ethical considerations, define boundaries, and create guidelines that ensure humans not only partner with AI but also take the driver's seat. This approach will be crucial not just for technological advancement, but for the sustainable progress of humanity.

James Villarrubia

James is a White House Presidential Innovation Fellow at NASA, renowned public speaker and CTO.

2025 will be the year of Agents taking center stage as they empower non-technical users to build systems through meta-logic business layers. These new businesses will create opportunities but also new security challenges, as CISOs must account for additional middlemen in their backend processes. On the consumer side, smaller, on-device models will gain traction, reducing dependency on network traffic. Meanwhile, I'm hoping that API specs will start to be rewritten to be, by default, consumed by both developers and AI agents.

Nidhi Verma

Nidhi Verma, Senior Director of Engineering at JPMorganChase, specializes in system design, architecture, and product development.

Edge AI and Real Time AI Processing - Edge AI will revolutionize real-time decision-making across industries, driving advancements in trading and market analysis within the financial sector, grid simulation and monitoring in the energy industry, and smart cities and healthcare. Powered by advanced AI chips and 5G, it enables faster, localized data processing, reducing reliance on cloud infrastructure.

Andrei Oprisan

Andrei Oprisan is a technology leader at agent.ai with 15+ years of experience delivering scalable eCommerce, marketing, and AI-driven products, a Columbia alumnus, published author, speaker, and mentor in machine learning and tech innovation.

In 2025, businesses will deploy AI employees: AI-powered synthetic professionals with specialized domain expertise, capable of devising strategic roadmaps, engaging in vendor negotiations, and even mentoring junior staff. These autonomous agents will transcend routine task automation to shape boardroom decisions and influence corporate culture. As AI shifts from a reactive tool to a proactive collaborator, organizations may see a tension between machine-driven insights and established executive intuition. Companies that embrace this friction and incorporate AI’s emergent creativity will outpace competitors. But the real differentiator will be rigorous oversight - building governance frameworks that balance high-impact AI contributions with accountability for potential ethical and legal missteps.

Yours Truly, Rami Sayar

Rami Sayar is an accomplished engineering leader delivering generative AI-powered experiences at Microsoft AI from zero to one to worldwide scale.

Throughout 2024, I cataloged through my newsletter, Rami’s Readings, a series of increasingly powerful open source LLMs optimized to run on devices with consumer-grade hardware. Thanks to engineering prowess, my own desktop equipped with an Nvidia RTX 4090 is now overkill for running most cutting-edge models. At CES 2025, Edge AI was all the buzz with nearly every OEM marketing their local AI solutions designed to run on their hardware. Let’s not forget Apple’s M4 chips are equally capable of running state-of-the-art LLMs with incredible speeds and energy efficiency thanks to MLX. In 2025, Edge AI isn’t just a trend—it will go mainstream.

🤖 AI Reads

This week’s newsletter is exceptionally long, so I clustered the links and limited comments to the most important reads. My comments and notes are in parenthesizes and italicized.

DeepSeek V3

Notes: DeepSeek V3 paper and model on HuggingFace. V3 is 671B parameter MoE that outperforms GPT-4o and Claude-Sonnet-3.5. It is incredibly fast, cost very little to train relatively speaking, fairly open-source, and demonstrates what I repeat in this newsletter. Chinese AI talent is underappreciated. I have highlighted DeepSeek’s first LLMs over several of my newsletters and DeepSeek-Coder-v2 was my go to for a while. Now, this model played a huge role in the next release: R1.

DeepSeek R1 🔥🔥🔥

Notes: DeepSeek R1 paper and models on HuggingFace. This release from the DeepSeek team is huge and rightly has everyone in industry and government ablaze. (I read so many jokes - my favorite was “DeepSeek released open AI”). 🤣 This release is actually multiple models: R1-Zero, R1, and then a series of distilled models. The reasoning performance matched OpenAI’s o1 model for a fraction of the training cost. Why do you need billions in GPUs and funding anymore?

How did they get there? DeepSeek applied RL to DeepSeek V3 getting R1-Zero. R1-Zero highlighted that you can develop reasoning performance through reinforcement learning (RL) alone. No need for human feedback or supervised fine-tuning at first… shocking! But R1-Zero suffered issues, so DeepSeek did it again but starting with high-quality CoT data to get reasoning capabilities faster, before then applying SFT and more RL to get R1. R1 is the game changer, but again very large (671B). So… they distilled it: DeepSeek-R1-Distill-Llama-70B outperforms OpenAI’s o1-mini. AND it is all MIT-licensed.

There is true engineering prowess on display, showcasing that mathematics, computer engineering fundamentals, and computer science still reign supreme. Case in point, llama.cpp is an example of this… driving AI to the edge without requiring data centers of GPUs. Just a reminder, DeepSeek’s engineers are mostly coming from a quant hedge fund.

China’s cheap, open AI model DeepSeek thrills scientists. (Nature)

China’s DeepSeek Shows Why Trade Wars Will Be Hard to Win (From Tyler Cowen)

So… I have DeepSeek-R1-Distill-Qwen-32B running on my 4090 giving me quasi-o1 performance as I write! 😁

If you need a video:

New Models:

Modern BERT. (Replacement for old BERT.)

SmallThinker-3B-preview. (Fine-tuned Qwen 3B model.)

New Infrastructure Startups / Products / Platforms:

Helicone (YC-backed.)

Unsloth.ai (Open source fine-tuning platform, picking up steam.)

opik (Open source LLM evaluation framework.)

Kiln AI (Another open source fine-tuning platform.)

New Tools / Libraries / Apps:

OpenAI Introduced Operator (An important release, but not as ground-breaking as anticipated.)

MarkItDown (Microsoft-backed Document to Markdown library for RAG.)

Project Mariner (Google’s agent in the browser similar to OpenAI’s Operator.)

Aider (AI pair programming in terminal. I will give it a try with my local LLM setup.)

New Papers / Articles to Read:

BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

New LLM optimization technique slashes memory costs up to 75%. (From Tokyo, Sakana is doing something interesting.)

Fast LLM Inference From Scratch. (Building an inference engine using C++ and CUDA from scratch, no libraries.)

Monolith: Real Time Recommendation System With Collisionless Embedding Table. (Oldie, relevant for those interested in TikTok.)

Byte Latent Transformer: Patches Scale Better Than Tokens. (Meta showing that byte transformers are feasible.)

New Hardware:

Nvidia Project Digits: The World’s Smallest AI Supercomputer. (I signed up and will try to get one the minute it comes out.)

OpenAI, Oracle and SoftBank’s Stargate. Mukesh Ambani Plans World’s Biggest Data Center. Meta’s Will Spend Up to $65 Billion.

Notes: Sigh…