I don't think MS has a special sauce here, just a willingness to publish. To the...

I don't think MS has a special sauce here, just a willingness to publish. To the extent MS has disclosed the bulk of what they are doing with Phi, it's a combination of really nice initial idea "Use written texts + GPT-4 to generate high quality prompts where we know the answer is great because it's written down" and engineering.

To me this is advancing the state of the art as to the impact of data quality, but it doesn't look to me like the phi series have some magical special sauce otherwise. Data of quality and synthetic data creation are not magical moats that Apple can't cross.

I'll say too that I'm psyched to try Phi-3; the sweet spot for me is a model that can be a local coding assistant and still answer random q&a questions with some sophistication. I'm skeptical that 3-8b parameter models will bring the high-level of sophistication sometimes needed in this cycle; there's still a very large gap with the larger models in daily use, despite some often close benchmark scores.

Anyway, Apple-Phi-3 is in no way an impossibility.