TLDR:
Sometime back, rather than writing a book about Steve, I ended up building a custom GPT, what would Jobs do in ChatGPT. It was bugging me, it was behind chatGPT and not with a clean UI chat interface. No more, finally, I have Ask Steve on lovable powered by Gemini at the end.
For all the common folks, it is a chat window to ask Steve Jobs what you want and hear from him based on 100 plus hours of video transcripts, mostly his interviews and talks 100+ articles written from the days of Apple launch and definitive 13 books written on him.
For all the AI nerd bros end-to-end RAG application with a hybrid search architecture, optimizing vector embeddings (Gemini/pgvector) and keyword fallback.
Yes, yes, I can finally say, You Know I am something of a AI Native builder myself. well sort of ;)
Recap/Evolution
I am a big fanatic fan of Steve Jobs, I ended up writing his biography in Tamil. I couldn't help but get mad whenever Steve Jobs and bad behavior of leaders and founders were referenced in the same breath.
I would scream aloud in the head, "this is not what he intended, what he meant and this is not what you learn from him or you are totally wrong or why use Steve as an excuse for your A behaviour"
I wanted to write a book on learnings. It stuck me in these day and age, rather than a static book, it would be good to have a dynamic, customized, tailored, unique AI
I ended up building a custom GPT on chatGPT. more on that here
When I wanted to write on how I built it and how to build customGPTs, I learnt, it could be more powerful, If I can build an end to end RAG application.
More than that, the thing bothered me was not having a better UI as chat interface.
So I started wondering, how do i have a good front end and a proper back end. and the result is https://asksteve.lovable.app
How is it has been built?
I had a collection of 13 books, 100+ articles and videos around 100 plus hours. I had all of them in text and used Notebook LM to create the core brain. Basically, more like a synthesis of all of them. Ideally all of them could be chunked as well. However it might not be efficient as it would mean AI is looking at the repetitive stuff. Rather it is critical to have a high density, single perfectly synthesized chunk. In other words, it is essential to pre-distill the data.
I have fed the core files, and source of articles and video transcripts. Using the books as such or chunking them might also lead to copy right issues. The 13 files have been converted to 3140 chunks and is stored in supabase (postgreSQL). To make this library searchable by "meaning" rather than just "words," have enabled the pgvector extension. This allows the database to store embeddings—mathematical vectors generated by the Gemini API that represent the semantic essence of each text chunk. Then used a recursive chunking strategy to break down the massive books into bite-sized pieces of roughly 1,000 characters, ensuring the AI can pinpoint specific stories or principles without getting lost in the documents
The "intelligence" of the system comes from its Hybrid Search architecture. When a user types a question, the backend doesn't just look for exact word matches; it performs a dual-track search. First, it uses Vector Search to find chunks that are conceptually related to the query (e.g., finding "craftsmanship" when you ask about "quality"). Second, it runs a Keyword Search (Full-Text Search) to catch specific names or historical terms like "NeXT" or "Xerox PARC" that might be mathematically blurred in vector space. By merging these results, we achieve a 99% retrieval precision, ensuring the response is grounded in actual facts rather than AI hallucinations.
When you hit "Send," a Supabase Edge Function acts as the traffic controller. It immediately converts your question into a vector via Gemini, queries the database for the top 10–15 most relevant chunks, and bundles them together. This "Knowledge Package", consisting of the Steve Jobs Persona instructions, the user’s question, and the retrieved primary source text, is sent to Gemini 1.5 Flash. The LLM then synthesizes a response that is blunt, direct, and "Insanely Great," streaming it back to your screen in real-time. This "Closed Loop" ensures that Steve isn't just guessing; he is effectively "reading" your curated library to answer you
what remains the same: Both across the custom GPT and the Gemini powered RAG, the data, the source and system instructions remain more or less the same. However, the way the data is processed or treated is different. Here it is a proper end to end RAG powered by Gemini. Yet it should be fun to see how they behave differently
Key Takeaways
In the world of AI, there are lot of things for some one to know. To make things worse, things evolve and change at a super fast pace. Both the quantum and the speed of change is just impossible to keep up.
What would really give the edge is, knowing what needs to be done/built, why it should be built, and how it should be built matters the most. (when I say how it means, the behavior, the output, rather than the tech stack)
If someone knows them well and if they have critical thinking and are good at prompting, they can build anything in the best possible way. I had to ask the LLMs what other approaches are there? why this? why not that? what are the pros and cons and it walks you through the trade offs and you can make a choice accordingly [For e.g the keyword vs vector search, why not to use books, even if you have copy right, at what point more data or training material doesn't move the needle much]
In fact lovable did the chunking by itself, i had to say use this method. It is not that I need to know this method, I need to know enough to ask the LLMs for it to tell me about possible methods and how to make it happen. In other words, if you are little tech savvy and if you can speak english and follow instructions, building most of things are child's play. [There are few applications, I am having a tough time to build them]
There is more than one way to skin the cat. I could have used Claude Code to build or google studio to do it end to end. I could have used ChatGPT API rather than Gemini API. To make it simpler, hey you could use AWS or GCP or Microsoft cloud but each comes with its own flavor. It is imperative to know which one would for you. Functionality, cost and so on.
More than that, the most important think is your source material, the data, how you organize it, how you train your LLM to deal with it. Again IMO, each LLM behaves differently. There is a variation in their flavors.
Building this also gives you the appreciation or make you wonder or appreciate, how difficult or challenging to would be to build something gemini or claude or chatGPT. or even a regular enterprise AI application with safeguards. (for e.g in the beginning, the feedback I got from my friend was it is mean and verbose.)