I did a lot of reading this weekend for my “book library” MVP. I dug deep into retrieval augmented generation (RAG) and learned some helpful things:
- There are different variations of RAG: GraphRAG, StructRAG, LightRAG, etc. New versions have been introduced every few months this year. Which is the right one depends on your use case.
- Normal RAG isn’t great for a large data set like a book. It struggles to make connections when presented with a lot of data.
- RAG’s results are better when you feed it relationships in a data set via a schema. GraphRAG, StructRAG, and LightRAG try to make up for this by using a knowledge graph to index the information better, which leads to understanding the data better and providing better results.
I’m realizing that how the information gets indexed in large data sets is critical, especially if I want to query across lots of dense data sets like books. Thinking about why entrepreneurs with photographic memories have an edge, I decided that their minds have done a superior job of indexing everything they’ve consumed and making nonobvious connections across the data. Those connections lead to unique insights that lead to creative solutions to problems or actions to get closer to their goal.
This weekend highlighted that I need to focus on and understand how information gets indexed as I evaluate RAG and other alternatives.