Mr. Md. Shakil Hossain | Multi-Modal and Cross-Modal Vision | Young Scientist Award

Research Assistant | Bangladesh University of Business and Technology | Bangladesh

Md. Shakil Hossain is an emerging researcher and academic specializing in Artificial Intelligence (AI), Machine Learning (ML), Deep Learning, Natural Language Processing (NLP), and Multimodal Learning. He currently serves as a Research Assistant at the Advanced Machine Intelligence Research (AMIR) Lab, where his work focuses on hybrid deep learning architectures, large language models (LLMs), and multimodal fusion systems for real-world AI applications. His research aims to bridge the gap between intelligent computation and societal needs, with contributions spanning sentiment analysis, mental health assessment, and cross-lingual text processing.Before joining AMIR Lab, he worked as a Market Research Analyst at Gram Ltd., where he conducted in-depth market and competitive analyses to support the launch of Dhopa Elo, an innovative startup product revolutionizing laundry services. He also utilized data analytics, customer segmentation, and ROI optimization to strengthen marketing strategies and business performance.Md. Hossain received a research grant from the Bangladesh University of Business and Technology (BUBT) for his project, Smart Agro-Monitor: IoT-Based Precision Farming for Enhanced Crop Management.” This initiative leveraged IoT and AI to improve water management, pest control, and crop health monitoring, empowering farmers with data-driven insights for sustainable agriculture.He has authored and co-authored 16 research papers in leading journals and conferences such as Scientific Reports, IEEE Access, Knowledge-Based Systems, and Neural Networks. His publications have collectively received 31 citations, with an h-index of 3 and an i10-index of 1, reflecting his growing academic impact. Collaborating with renowned scholars including Prof. Dr. A. B. M. Shawkat Ali, Md. Hossain continues to pursue interdisciplinary AI research that promotes innovation, ethics, and societal advancement through intelligent technologies.

Profiles: Google Scholar | ORCID | Scopus

Featured Publications

1.Hossain, M. M., Hossain, M. S., Mridha, M. F., Safran, M., & Alfarhood, S. (2025). Multi-task opinion enhanced hybrid BERT model for mental health analysis. Cited By: 13

2.Hossain, M. M., Hossain, M. S., Hossain, M. S., Mridha, M. F., & Safran, M. (2024). TransNet: Deep attentional hybrid transformer for Arabic posts classification. IEEE Access. Cited By: 7

3.Hossain, M. M., Hossain, M. S., Safran, M., Alfarhood, S., & Alfarhood, M. (2024). A hybrid attention-based transformer model for Arabic news classification using text embedding and deep learning. IEEE Access. Cited By: 6

4.Hossain, M. M., Hossain, M. S., Chaki, S., Hossain, M. R., Rahman, M. S., & Ali, A. B. M. (2025). CrosGrpsABS: Cross-attention over syntactic and semantic graphs for aspect-based sentiment analysis in a low-resource language. Cited By: 2

5.Hossain, M. S., Hossain, M. M., Hossain, M. S., Mridha, M. F., & Safran, M. (2025). EmoNet: Deep attentional recurrent CNN for X (formerly Twitter) emotion classification. IEEE Access. Cited By: 2

Md. Shakil Hossain’s research advances the integration of AI, NLP, and IoT to solve real-world problems in healthcare, agriculture, and digital communication. His work promotes human-centered, sustainable, and data-driven innovation, empowering industries and societies to harness intelligent technologies for global progress.

Introduction of Multi-modal and Cross-modal Vision

Multi-modal and Cross-modal Vision research is a dynamic field within computer vision that seeks to bridge the gap between different types of sensory data, enabling machines to understand and interpret information from multiple modalities, such as text, images, videos, and audio. This interdisciplinary research area has profound implications for improving the capabilities of AI systems, human-computer interaction, and information retrieval, among others.

Subtopics in Multi-modal and Cross-modal Vision:

Text-to-Image Generation: Researchers work on models that can generate realistic images from textual descriptions or vice versa. This has applications in content creation, design, and multimedia generation.
Image-Text Retrieval: This subfield focuses on developing algorithms that enable users to search for images based on textual queries or find relevant text documents based on image content, facilitating efficient information retrieval.
Cross-modal Translation: Researchers explore methods to translate content from one modality to another, such as translating sign language to text or speech to text, making information more accessible.
Multimodal Fusion: The integration of information from different modalities is a core research area. Methods for effectively fusing and combining data from sources like text, images, and audio are developed to improve AI system understanding and decision-making.
Affective and Emotional Analysis: This subtopic involves analyzing emotions expressed in multiple modalities, such as facial expressions, voice tone, and text sentiment, which is valuable for applications in human-computer interaction, sentiment analysis, and mental health monitoring.

Multi-modal and Cross-modal Vision research holds great promise in advancing AI systems' ability to understand and interpret the rich diversity of information present in the real world. These subtopics reflect the ongoing efforts to create more versatile and capable AI systems.

Introduction of Multi-modal and Cross-modal Vision

RECOMMENDED

Mail us