On this article, I’ll focus on how one can incorporate data from completely different modalities into your machine studying system. These modalities may be data like a picture, textual content, or audio. It might probably additionally, for instance, be a number of photos of the identical object taken from completely different angles. Including data from completely different modalities offers the machine studying system extra data to work with, which may, in flip, improve the efficiency of the system.
My motivation for this text is that I’m at the moment engaged on an issue the place I’ve data from two completely different modalities. The primary modality is the visible data of a doc, and the second modality is the textual content contained throughout the doc. Individually, a machine studying system can obtain first rate efficiency utilizing solely the visible information from the doc or the textual information from the textual content within the doc. Nonetheless, if you’re solely utilizing one of many two out there modalities, you should give machine studying all the knowledge potential to attain one of the best efficiency. Due to this fact, you need to mix completely different modalities to make sure one of the best…