Hacker News new | past | comments | ask | show | jobs | submit login

I'm pretty naive so please forgive it's a stupid question.

To me, what the parent comment is saying is that even though the benchmarks are cool, it's not super helpful to the every day person. Because if you can't chat with it very well (even for a narrow context) what utility does it have with great benchmarks?




Both are saying the same thing: in order for the base model that is phi to perform well as a chat agent, it would need to be tuned for that purpose before its benchmark results could have real-world value.


From this report. Phi-2 was not instruct tuned indeed.

"Our models went through post-training with both supervised instruction fine-tuning, and preference tuning with DPO. We have worked on generating and curating various instruction and preference data. This has improved the model chat capabilities, robustness, as well as its safety."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: