Hacker News new | past | comments | ask | show | jobs | submit login
LLMs and SQL
3 points by zephodb 17 days ago | hide | past | favorite | 9 comments
Why is there no opensource or closed source solution as on today for a LLMs performing Data Analysis on Large databases. I mean complex schemas and with more than 100 tables. Essentially to replace data analysts.Is it really that difficult. While everyone in interested in finetuning, RAG and Agents, no one is serious about using LLMs for data analysis on a large scale. For starters, one can try using the Microsoft Contoso BI demo Dataset. Even GPT4 fails miserably to answer the simplest Analysis questions.



You already mentioned it - RAG.

But like the "AI lawyer", the "AI doctor", the "AI coder", it doesn't replace the analyst, it just improves the quality and output of their work.

It might replace analyst interns, or some entry-level positions that were mostly low-level tasks and data entry, but a data analyst at a company who knows how to put data into a LlamaIndex for querying is a very useful analyst - think of what can be built on top of that domain-specific LLM for the company, for the analyst. It makes non-technical analysts who don't use AI seem incredibly passe and useless in comparison.

Same with auto mechanics, lawyers, doctors, HVAC techs, construction workers - AI tooling will even make its way into police work (even in the field, running on their little computer in-car) as well as dispatching.

Doesn't replace anyone, but has the ability to drastically improve our quality of work. Turns the expert into a super-expert.


It is really that difficult.

Plus, it helps to have a neck to wring when it turns out that a slightly wrong left join deep in the data reporting pipeline means that you just told the CEO/Board/Wall Street materially incorrect numbers...

Also the part about data analysis is understanding the data and any nuances. Databases tend to be the place where all of an organization's pathologies over time get encoded, so sometimes column names mean a totally different thing then their name, and it's stupid, but it would take too much politics to change so nobody does, and updating the documentation would mean admitting the work around so it's all informally passed around amongst the analysts...


This project might be useful too - https://github.com/defog-ai/sqlcoder


I'm unclear on how you mean to connect those technologies in a useful way.

I could see LLMs as query-suggestors, but it doesn't make much sense to feed them data that isn't more "language." Their fundamental design is not conducive to doing math or solving logic constraints.


It could be a nice-to-have to feed the database schema into an LLM to allow it to generate queries (would of course need a step to double check in case it hallucinations a query that drops your tables, or limit it to SELECT), then run the LLM again on the query results to point out specific values. You could also provide a few inbuilt functions to generate different visualisations and let the LLM pick one.


Quite a few folks are working it actually https://star-history.com/blog/text2sql


I'm not sure if I understand this well, but I'm not sure whether large companies will want to share their databases directly with OpenAI.


Maybe this project could help? https://github.com/chat2db/Chat2DB


I found that project a few days ago too, it looks promising.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: