Thesis Open Access

DESIGNING HEALTHCARE DATA ANALYTICS FRAMEWORK BASED ON BIG DATA APPROACH: IN CASE OF STROKE DISEASE PREDICTION

ASSEFA SENBATO


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nam##2200000uu#4500</leader>
  <controlfield tag="001">4189</controlfield>
  <datafield tag="502" ind1=" " ind2=" ">
    <subfield code="c">ADDIS ABABA SCIENCE AND TECHNOLOGY UNIVERSITY</subfield>
  </datafield>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">DESIGNING HEALTHCARE DATA ANALYTICS FRAMEWORK BASED ON BIG DATA APPROACH: IN CASE OF STROKE DISEASE PREDICTION</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">http://www.opendefinition.org/licenses/cc-by</subfield>
    <subfield code="a">Creative Commons Attribution</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2019-10-25</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">2290203</subfield>
    <subfield code="u">https://nadre.ethernet.edu.et/record/4189/files/DESIGNING HEALTHCARE DATA ANALYTICS FRAMEWORK BASED ON BIG DATA APPROACH IN CASE OF.pdf</subfield>
    <subfield code="z">md5:cd7e8432a2cd800b4a27c01826699c18</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.20372/nadre/4189</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <controlfield tag="005">20191105130933.0</controlfield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">publication</subfield>
    <subfield code="b">thesis</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Big data, Distributed Machine learning, Healthcare data analytics, Hadoop, MLlib, Stroke disease, Risk factor, Spark</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="a">ASSEFA SENBATO</subfield>
    <subfield code="u">ADDIS ABABA SCIENCE AND TECHNOLOGY UNIVERSITY</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="a">10.20372/nadre/4188</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="n">doi</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Sreenivasa Rao Vuda (PhD)</subfield>
    <subfield code="u">ADDIS ABABA SCIENCE AND TECHNOLOGY UNIVERSITY</subfield>
    <subfield code="4">ths</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Ejegu Tefera(MSc)</subfield>
    <subfield code="u">ADDIS ABABA SCIENCE AND TECHNOLOGY UNIVERSITY</subfield>
    <subfield code="4">ths</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-aastu</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-nadre</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;The healthcare industry is one of the intensive and sensitive organization which is generating a massive amount of data with various formats. In order to manage and provides meaningful information from these data, it needs serious attention to analytic techniques and modern tools to enhance the quality of service and reduce cost. Stroke disease is the most common cause of global death. So, the early detection of stroke disease and continuous monitoring can reduce the mortality rate. However, the exponential growth of data from different sources such as medical data, patient history, streaming (real-time) system, wearable sensor devices, and others have become biggest challenges to perform advanced analytics using conventional techniques including prediction in order to generate right insight from data for a better decision. The combination of big data analytics and machine learning is an advanced technology that can have a significant impact on the healthcare sector especially early detection of stroke disease. This technology can be less expensive and more powerful. To overcome this challenge, in this study a healthcare data analytics framework for stroke disease prediction based on Apache Spark is proposed. The proposed framework is implemented using Apache Spark, which is a leading platform with its fast and large scale distributed computing performance for both batch and streaming data processing, through in-memory computations. We have implemented four scalable algorithms in the Spark ML: Decision Tree, Random Forest, Gradient Boosting Tree and Logistic Regression using stroke healthcare dataset that collected from a Medical Quality Improvement Consortium (MQIC) database with consultation of cardiologist from the local hospitals to make analysis and prediction of stroke disease. Thus, with one master node and two worker nodes stroke data analytics was performed and the performance of model evaluated and compared using performance metric like Confusion Matrix, Area under Curve (AUC). Based on the experiment result Decision Tree found to be the best with an accuracy of 94.3% and an AUC score of 99%, and also diabetes is identified as the major risk factor of stroke disease followed by hypertension. This study showed that Apache Spark with its scalable machine learning techniques can be used efficiently to model, predict stroke disease and identify risk factors earlier. The result of this study can be used as clinical decision supports by physician to help them to make a more consistent diagnosis of stroke disease.&lt;/p&gt;</subfield>
  </datafield>
</record>
43
10
views
downloads
All versions This version
Views 4343
Downloads 1010
Data volume 22.9 MB22.9 MB
Unique views 1212
Unique downloads 55

Share

Cite as