Rami's Readings #116 - Data & Pipelines
The latest on AI, LLMs, Common Pile, FineWeb2, MCP, Deep Research Agents, Australian Startups, and more.
Welcome to Rami’s Readings #116 - a weekly digest of interesting articles, papers, videos, and X threads from my various sources across the Internet. Expect a list of reads covering AI, technology, business, culture, fashion, travel, and more. Learn about what I do at ramisayar.com/about.
👋🏼 Welcome New Subscribers
Hello! A hearty thank you for subscribing to Rami's Readings! There are quite a few new subscribers this week, thanks to recommendations from The AI Ethics Brief, Global Fintech Insider, and The VC Corner. I am thrilled to have you on board! In this newsletter, I curate the best papers, tweets, and articles I have read during the week focusing on LLMs, AI, economics, business, and technology news. You can learn more about me on my website.
📈 Top Recent Editions According to Substack
🤖 AI Reads
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
Notes: With contributors from my friends in Canada, local Seattle, MIT, etc.
FineWeb2: One Pipeline to Scale Them All
Notes: Incredible paper and project from the team at HuggingFace. Deduplication is incredibly difficult at scale.
Google Released Gemma 3n
Notes: Small multimodal models designed to run on the edge. Available for Llama.cpp, MLX, Ollama, etc.
MCP in LM Studio
Notes: LM Studio supports both local and remote MCP servers. I particularly like that LM Studio allows you to review the tool call arguments before executing it. You can even edit the arguments to correct the LLM.
Deep Research Agents: A Systematic Examination And Roadmap
Notes: The GitHub repository is great too!
💼 Business Reads
Australia Startups Starved of Cash Despite Big Potential Returns
Notes: Canadian founders face a similar gap in funding sources domestically, but founders can and often do make quick trips to Sand Hill Road.
What Traders Have Gotten Wrong in 2025
Notes: A fun read! 😂
Signing off from Redmond, WA.