What is HCC?

Hepatocellular carcinoma (HCC) is the most common type of primary liver cancer. Hepatocellular carcinoma occurs most often in people with chronic liver diseases, such as cirrhosis caused by hepatitis B or hepatitis C infection.

In this R project we will study the survival of patients with hepatocellular carcinoma with 3 different algorithms :

Prerequisites:

You will need to install mice library

What is Mice?

MICE (Multivariate Imputation via Chained Equations) Creating multiple imputations as compared to a single imputation (such as mean) takes care of uncertainty in missing values. ​ MICE assume that the missing data are Missing at Random (MAR), which means that the ​ probability that a value is missing depends only on observed value and can be predicted using ​ them. It imputes data on a variable by variable basis by specifying an imputation model per ​ variable.

Steps for our prediction project:

  1. Make necessary imports, Get the features and lables from dataset .

  2. Use Mice for imputation the missing values.

  3. Training the data

    we loaded the class feature as a factor for prediction

Logistic regression method

k-Nearest Neighbors

K-nearest neighbors (KNN) algorithm uses ‘feature similarity’ to predict the values of new datapoints which further means that the new data point will be assigned a value based on how closely it matches the points in the training set.

photo1.png images

Output:

Naiive Bayes method

Naive Bayes is a probabilistic machine learning algorithm that can be used in a wide variety of classification tasks.

photo2.png images

output

Cross Validation:

are used to measure the regression model performance during CV .

Confusion Matrix:

TPFN
FPTN

Feature selection:

photo3.png images

Output:

ClassALPHemoglobinAlbuminDir_BilTotal_BilMajor_DimINRAST
217811.73.40.33.91.81.01112

- Chi-Squared Test: