Thesis Open Access

DESIGNING HEALTHCARE DATA ANALYTICS FRAMEWORK BASED ON BIG DATA APPROACH: IN CASE OF STROKE DISEASE PREDICTION

ASSEFA SENBATO


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="DOI">10.20372/nadre/4189</identifier>
  <creators>
    <creator>
      <creatorName>ASSEFA SENBATO</creatorName>
      <affiliation>ADDIS ABABA SCIENCE AND TECHNOLOGY UNIVERSITY</affiliation>
    </creator>
  </creators>
  <titles>
    <title>DESIGNING HEALTHCARE DATA ANALYTICS FRAMEWORK BASED ON BIG DATA APPROACH: IN CASE OF STROKE DISEASE PREDICTION</title>
  </titles>
  <publisher>National Academic Digital Repository of Ethiopia</publisher>
  <publicationYear>2019</publicationYear>
  <subjects>
    <subject>Big data, Distributed Machine learning, Healthcare data analytics, Hadoop, MLlib, Stroke disease, Risk factor, Spark</subject>
  </subjects>
  <contributors>
    <contributor contributorType="Supervisor">
      <contributorName>Sreenivasa Rao Vuda (PhD)</contributorName>
      <affiliation>ADDIS ABABA SCIENCE AND TECHNOLOGY UNIVERSITY</affiliation>
    </contributor>
    <contributor contributorType="Supervisor">
      <contributorName>Ejegu Tefera(MSc)</contributorName>
      <affiliation>ADDIS ABABA SCIENCE AND TECHNOLOGY UNIVERSITY</affiliation>
    </contributor>
  </contributors>
  <dates>
    <date dateType="Issued">2019-10-25</date>
  </dates>
  <language>en</language>
  <resourceType resourceTypeGeneral="Text">Thesis</resourceType>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://nadre.ethernet.edu.et/record/4189</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.20372/nadre/4188</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://nadre.ethernet.edu.et/communities/aastu</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://nadre.ethernet.edu.et/communities/nadre</relatedIdentifier>
  </relatedIdentifiers>
  <rightsList>
    <rights rightsURI="http://www.opendefinition.org/licenses/cc-by">Creative Commons Attribution</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;The healthcare industry is one of the intensive and sensitive organization which is generating a massive amount of data with various formats. In order to manage and provides meaningful information from these data, it needs serious attention to analytic techniques and modern tools to enhance the quality of service and reduce cost. Stroke disease is the most common cause of global death. So, the early detection of stroke disease and continuous monitoring can reduce the mortality rate. However, the exponential growth of data from different sources such as medical data, patient history, streaming (real-time) system, wearable sensor devices, and others have become biggest challenges to perform advanced analytics using conventional techniques including prediction in order to generate right insight from data for a better decision. The combination of big data analytics and machine learning is an advanced technology that can have a significant impact on the healthcare sector especially early detection of stroke disease. This technology can be less expensive and more powerful. To overcome this challenge, in this study a healthcare data analytics framework for stroke disease prediction based on Apache Spark is proposed. The proposed framework is implemented using Apache Spark, which is a leading platform with its fast and large scale distributed computing performance for both batch and streaming data processing, through in-memory computations. We have implemented four scalable algorithms in the Spark ML: Decision Tree, Random Forest, Gradient Boosting Tree and Logistic Regression using stroke healthcare dataset that collected from a Medical Quality Improvement Consortium (MQIC) database with consultation of cardiologist from the local hospitals to make analysis and prediction of stroke disease. Thus, with one master node and two worker nodes stroke data analytics was performed and the performance of model evaluated and compared using performance metric like Confusion Matrix, Area under Curve (AUC). Based on the experiment result Decision Tree found to be the best with an accuracy of 94.3% and an AUC score of 99%, and also diabetes is identified as the major risk factor of stroke disease followed by hypertension. This study showed that Apache Spark with its scalable machine learning techniques can be used efficiently to model, predict stroke disease and identify risk factors earlier. The result of this study can be used as clinical decision supports by physician to help them to make a more consistent diagnosis of stroke disease.&lt;/p&gt;</description>
  </descriptions>
</resource>
43
10
views
downloads
All versions This version
Views 4343
Downloads 1010
Data volume 22.9 MB22.9 MB
Unique views 1212
Unique downloads 55

Share

Cite as