Multi-head consideration performs an important position in transformers, which have revolutionized Pure Language Processing (NLP). Understanding this mechanism is a needed step to getting a clearer image of present state-of-the-art language fashions.