Supposedly LLMs (especially smaller ones) are best suited to tasks where the ans...

kiratp · 2024-04-23T17:11:15

The technique to use is to give the model an “out” for the missing/negative case.

"{text}\n\n###\n\nPlease summarize the text above. The text is a video transcript. It may not have the names of the speakers in it. If you need to refer to an unnamed speaker, call them Speaker_1, Speaker_2 and so on."

woodson · 2024-04-23T16:44:16

Especially for small models I had very bad results for use in translation. Even trying all kinds of tricks didn’t help (apparently prompting in the target language helps for some). Encoder-decoder models such as FLAN-T5 or MADLAD-400 seemed far superior at equal or even smaller model size.

andai · 2024-04-23T19:19:07

I forget which model (LLaMA 3?) but I heard 95% of the training data was English.

Grimblewald · 2024-04-24T02:59:52

for sure, so my use case for example is

"using the following documentation to guide you {api documentation}, edit this code {relevant code}, with the following objective: Replace uses of {old API calls} in {some function} with with relevant functions from the supplied documentation"

It mostly works, but if the context is a little to long, sometimes it will just spam the same umlaut or number (always umlaut's or numbers) over and over for example. Perhaps some fine-tuning of parameters like temp. or repetition penalty might fix it, time will tell.

andai · 2024-04-24T22:06:02

Are you using ollama? Devs said there was a bug that occurs when context is full, they're working on it.

Grimblewald · 2024-05-06T11:47:06

That would do it, I am indeed.