Resources That Actually Made Me Better
Not a generic list. These are the books I have dog-eared, the tools I use daily, and the databases I trust in production. Every recommendation here comes from hands-on experience -- I have used each one on real projects, hit their limits, and still come back to them.
If you are getting started in data engineering or looking to level up, this is where I would point you. I have organized them by category with honest notes on why each one matters and where it falls short.
Books Worth Your Time
These four books cover the core knowledge every data engineer needs. I have read each one cover to cover, and I still revisit them before system design interviews and architecture reviews.
Designing Data-Intensive Applications
The one book every data engineer should read. Martin Kleppmann explains distributed systems, replication, partitioning, and consistency models in a way that finally makes sense. I re-read chapters before every system design interview. The chapter on stream processing alone changed how I think about real-time pipelines.
Best for: Anyone working with distributed systems, databases, or data-intensive applications at scale.
Get the bookFundamentals of Data Engineering
If DDIA is the deep dive, this is the complete map. Covers the entire data engineering lifecycle from ingestion to serving. Perfect if you are transitioning into data engineering or want to fill gaps in your knowledge. The sections on data architecture and orchestration are particularly strong.
Best for: New data engineers, career transitioners, and anyone who wants a structured understanding of the full data lifecycle.
Get the bookThe Data Warehouse Toolkit
Kimball's dimensional modeling is still relevant 25 years later. Star schemas, slowly changing dimensions, conformed dimensions -- this book is why your data warehouse actually makes sense to analysts. Even with modern tools like dbt, the modeling principles here are foundational.
Best for: Anyone building analytical data models, designing warehouse schemas, or working with BI teams.
Get the bookSpark: The Definitive Guide
Co-authored by the creator of Spark. Goes from basics to advanced optimizations. The chapters on partitioning and shuffle operations alone saved me hours of debugging slow jobs. If you work with Spark in production, this book pays for itself the first time you fix a skew issue.
Best for: Data engineers working with Spark, Databricks, or large-scale batch processing.
Get the bookDatabases I Trust
I have tried dozens of database services over the years. These two stand out for developer experience, reliability, and value. Both have generous free tiers that work for side projects and prototyping.
Neon
Serverless Postgres that scales to zero. This blog runs on Neon. Branch your database like git, pay only for what you use. The free tier is generous enough for most side projects. The branching feature is genuinely useful for testing schema migrations safely before applying them to production.
Try freePlanetScale
MySQL with the best DX I have seen. Deploy requests work like pull requests for your schema. Non-blocking schema changes mean no more 3am maintenance windows. The dashboard gives you query insights that actually help you optimize slow queries without digging through slow query logs.
Try freeTools I Use Daily
My daily workflow relies on AI-assisted development tools. These two have fundamentally changed how fast I ship code and write content. I have tried alternatives and keep coming back to these.
Cursor
VS Code fork with AI that actually understands your codebase. Tab-complete that works, chat that can edit multiple files. I write code noticeably faster now. The multi-file editing through chat is the killer feature -- describe what you want across three files and it just does it.
Download freeClaude
My go-to for complex reasoning, code review, and writing. Handles long documents better than competitors. Most of my blog drafts start as Claude conversations where I work through the technical concepts before structuring them into articles. Claude Code in the terminal is excellent for codebase-wide refactoring.
Try freeHow I Choose What to Recommend
Every resource on this page meets three criteria. First, I have used it personally on real projects -- not just tried it for a review. Second, it has to provide value that I could not easily get elsewhere. Third, I would recommend it to a colleague without hesitation. If something stops meeting these criteria, I remove it from the list.
I deliberately keep this list short. A curated recommendation is more useful than an exhaustive catalog. If you think something should be on here, let me know -- I am always looking for tools and books that make data engineering better.
Disclosure: Some links on this page are affiliate links (primarily to Amazon). You pay the same price; I earn a small commission that helps keep this blog running. Affiliate links do not influence which products I recommend -- I only include things I have personally used and would recommend regardless of any commission. For more details, see the Privacy Policy.