Rami's Readings #95 - Welcome 👋🏼 and More on DeepSeek 🔥
The latest on AI, LLMs, Supervised Fine-tuning vs. Reinforcement Learning, DeepSeek R1, Singapore, The Office (Show), and more.
Welcome to Rami’s Readings #95 - a weekly digest of interesting articles, papers, videos, and X threads from my various sources across the Internet. Expect a list of reads covering AI, technology, business, culture, fashion, travel, and more. Learn about what I do at ramisayar.com/about.
👋🏼 Welcome New Subscribers
Hello! A hearty thank you for subscribing to Rami's Readings! There is quite a few new subscribers this week. I am thrilled to have you on board. If you’re new to this newsletter, I do my best to curate the best papers, tweets, and articles I have read during the week focusing on LLMs, AI, economics, business, and technology news. Less widely shared articles, more great reads missed among the noise.
📈 Top 3 Newsletters According to Substack:
🤖 AI Reads
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Notes: Great paper to read from Google DeepMind, NYU and HKU considering all the drama about #DeepSeekR1 and SFT vs. RL. Read my comments last week.
Reinforcement Learning from Human Feedback Book
Notes: This book is still a WIP from Nathan Lambert, especially relevant considering #DeepSeekR1.
If you are looking for well-researched perspectives, I highly recommend ChinaTalk’s podcast. Just a reminder, intra-China AI competition is just as fierce as the global AI competition. I pointed to this competition in #64, when it became clear that DeepSeek was driving a 99% drop in inference costs with DeepSeekV2. This translated interview is valuable to catch up on the history of DeepSeek in light of R1.
DeepSeek Released Janus Pro
Notes: Another great release from DeepSeek that improves their Janus series of unified multimodal understanding and generation models. Model on HuggingFace.
Unsloth Quantized DeepSeek R1 to 1.58-bit
Notes: They quantized the full 671B parameter model down to fit on 2xH100. Great article to read to understand DeepSeek’s architecture. Instructions provided so you can do it yourself. You can run a “full” R1 locally with a RTX 4090 and stacks of RAM with this quantized version.
TinyZero: Reproduction of DeepSeek R1 Zero
Notes: From the Berkley AI Research group.
Chain-of-Retrieval Augmented Generation
Notes: From the group at Microsoft (not my org) that brought us 1-bit LLMs, the technique will sound familiar to a few folks on this list.
Conventional RAG methods usually perform a single retrieval step before the generation process, which limits their effectiveness in addressing complex queries due to imperfect retrieval results. Our proposed method, CoRAG (Chain-of-Retrieval Augmented Generation), allows the model to dynamically reformulate the query based on the evolving state.
Mistral Small 3 Released
Notes: Optimized for latency and performance. Did I miss the Magnet precursor link this time? I’m not used to Mistral launching without a surprise Magnet link drop. 😂
💼 Business Reads
I will be avoiding the obvious news from this weekend.
Singapore’s Cautious Wealth Fund Takes More Private Markets Risk
Notes: GIC is a lesser known, but highly important Singaporean fund. I admit that I know less about them than Temasek.
Complete Guide of Government Support for Foreign-Owned Startups in Singapore
Notes: For startup friends interested in Southeast Asia.
‘The Office’ Producer Is Seeking to Raise Up to $250 Million
Notes: I can’t help but share this news! I wondering what they will be making!
🔀 Other Reads
How We Run a 5 GB/s Kafka Workload for Just $50 per Hour
Notes: I am curious if anyone can comment on or has reproduced the cost-savings with their production workloads.
Signing off from one of my favorites in Redmond, 5 Stones Coffee Shop.