Predicting infectious disease using big data analytics

Predicting infectious disease using big data analytics


Predicting infectious disease using big data analytics

نوع: Type: thesis

مقطع: Segment: masters

عنوان: Title: Predicting infectious disease using big data analytics

ارائه دهنده: Provider: Fateme Mohseni

اساتید راهنما: Supervisors: Dr Morteza Youssef Sanati

اساتید مشاور: Advisory Professors:

اساتید ممتحن یا داور: Examining professors or referees: Dr Moharram Mansourizadeh, Dr Mehdi Sakhaei Nia

زمان و تاریخ ارائه: Time and date of presentation: 11/03/2023

مکان ارائه: Place of presentation: Amphitheater

چکیده: Abstract: Today, health prediction in modern life is very necessary due to the large volume, variety and constant updating of medical data, which big data analysis provides new opportunities to improve health care. Health status vision provides optimization of resources and efficiency of organizations in the health sector, for this purpose, we need advanced analytical frameworks to store, filter and analyze data to Be able to make quick and timely decisions. By storing and recording examinations and visits of patients and those who refer to medical centers, the amount of information that is collected is growing, so a correct and timely analysis of the amount of data produced in health can be advanced. It leads to disease, which saves human lives, and since infectious diseases are among the most common causes of death, early prediction can be done using big data analysis. It can prevent the spread of some diseases, and sometimes in the case of a specific disease, the disease can be diagnosed early and its treatment can be started, which will save a lot of treatment costs. In this research, random forest classification in machine learning, which is a common method of tree collection and manages big data well; Used. However, the random forest implementation in MLlib is very inefficient for training deep decision tree models, which is required to achieve good predictive performance on our data. Therefore, we focus on improving the performance of random forest training in the MLlib library from Spark. This model is then used to work in real time to classify the tweet as whether the person has hepatitis disease or not. In this proposed method, the real-time hepatitis disease prediction system contains three main parts: offline model building, current processing method, and online prediction. This system was developed based on the integration of big data frameworks such as: Apache Spark and Kafka

فایل: ّFile: