I don't know that they physics analogy works so well, or at least, it's definitely missing something. What prevents the whole word "universe" from collapsing on itself, forming a black hole? That is, if there are only attractive forces, the global optimum is to co-locate everything in the same point, which doesn't give you a useful model. There needs to be something in the model that keeps different words apart from each other.
This page has the clearest explanation of word embeddings and the relationship between the objective function and why vector translation captures meaning.
it works because the gravity of word2vec isn't the gravity of real life.
notice that I only pull the word "dog" to the center of gravity of the rest words, instead of pulling all of them together. I think the full version even push the rest of the words away from the center of gravity.
but I need to double check the math.
this is not just an analogy. this is what word2vec's math says.
the only analogy part is that word2vec is in high dimension, my analogy is in 3 dimension.
This page has the clearest explanation of word embeddings and the relationship between the objective function and why vector translation captures meaning.
http://nlp.stanford.edu/projects/glove/