DECISION TREES for Risk Assessment One of the great advantages of decision trees is their interpretability. The rules learnt for classification are easy for a person to follow, unli

30 Aug DECISION TREES for Risk Assessment One of the great advantages of decision trees is their interpretability. The rules learnt for classification are easy for a person to follow, unli

Posted at 02:56h in Computer Science by

DECISION TREES for Risk Assessment

One of the great advantages of decision trees is their interpretability. The rules learnt for classification are easy for a person to follow, unlike the opaque "black box" of many other methods, such as neural networks. We demonstrate the utility of this using a German credit data set. You can read a description of this dataset at the UCI site. The task is to predict whether a loan approval is good or bad credit risk based on 20 attributes. We've simplified the data set somewhat, particularly making attribute names and values more meaningful.

1. Download the credit_Dataset.arff dataset and load it to Weka.

2. (5 Points) When presented with a dataset, it is usually a good idea to visualise it first. Go to the Visualise tab. Click on any of the scatter plots to open a new window which shows the scatter plot for two selected attributes. Try visualising a scatter plot of age and duration. Do you notice anything unusual? You can click on any data point to display all it's values.

3. (5 Points) In the previous point you should have found a data point, which seems to be corrupted, as some of its values are nonsensical. Even a single point like this can significantly affect the performance of a classifier. How do you think it would affect Decision trees? A good way to check this is to test the performance of each classifier before and after removing this datapoint.

4. (10 Points) To remove this instance from the dataset we will use a filter. We want to remove all instances, where the age of an applicant is lower than 0 years, as this suggests that the instance is corrupted. In the Preprocess tab click on Choose in the Filter pane. Select filters > unsupervised > instance > RemoveWithValues. Click on the text of this filter to change the parameters. Set the attribute index to 13 (Age) and set the split point at 0. Click Ok to set the parameters and Apply to apply the filter to the data. Visualise the data again to verify that the invalid data point was removed.

5. (20 Points) On the Classify tab, select the Percentage split test option and change its value to 90%. This way, we will train the classifiers using 90% of the training data and evaluate their performance on the remaining 10%. First, train a decision tree classifier with default options. Select classifiers > trees > J48 and click Start. J48 is the Weka implementation of the C4.5 algorithm, which uses the normalized information gain criterion to build a decision tree for classification.

6. (20 Points) After training the classifier, the full decision tree is output for your perusal; you may need to scroll up for this. The tree may also be viewed in graphical form by right-clicking in the Result list and selecting Visualize tree; unfortunately this format is very cluttered for large trees. Such a tree accentuates one of the strengths of decision tree algorithms: they produce classifiers which are understandable to humans. This can be an important asset in real life applications (people are seldom prepared to do what a computer program tells them if there is no clear explanation). Observe the output of the classifier and try to answer the following questions:

o How would you assess the performance of the classifier? Is the Percentage of Correctly Classified Instances a sufficient measure in this case? Why? Hint: check the number of good and bad cases in the test sample, using the confusion matrix. Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. For example let us define an experiment from P positive instances and N negative instances. The four outcomes can be formulated in a 2 by 2 contingency table or confusion matrix. One benefit of a confusion matrix is that it is easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another).

o Looking at the decision tree itself, are the rules it applies sensible? Are there any branches which appear absurd? At what depth of the tree? What does this suggest?
Hint: Check the rules applied after following the paths: (a) CheckingAccount = <0, Foreign = yes, Duration >11, Job = skilled, OtherDebtors = none, Duration <= 30 and (b) CheckingAccount = <0, Foreign = yes, Duration >11, Job = unskilled.

o How does the decision tree deal with classification in the case where there are zero instances in the training set corresponding to that particular path in the tree (e.g. those leaf nodes that have (0:0))?

7. (20 Points) Now, explore the effect of the confidenceFactor option. You can find this by clicking on the Classifer name (to the right of the Choose button on the Classify tab). On the Classifier options window, click on the More button to find out what the confidence factor controls. Try the values 0.1, 0.2, 0.3 and 0.5. What is the performance of the classifier at each case? Did you expect this given your observations in the previous questions? Why do you think this happens?

8. (20 Points) Suppose that it is worse to classify a customer as good when they are bad, than it is to classify a customer as bad when they are good. Which value would you pick for the confidence factor? Which performance measure would you base your decision on?

9. (20 Points)Finally we will create a random decision forest and compare the performance of this classifier to that of the decision tree and the decision stump. The random decision forest is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. Again set the test option Percentage split to 90%. Select classifiers > trees > RandomForest and hit Start. Again, observe the output. How high can you get the performance of the classifier by changing the number of trees (numTrees) parameter? How does the random decision forest compare performance wise to the decision tree and decision stump?

Assignment

(100 Points)

CYBR 7240 – Cyber Analytics and Intelligence Module 03: Decision Trees

DECISION TREES for Risk Assessment

1. Download the credit_Dataset.arff dataset and load it to Weka.

· How would you assess the performance of the classifier? Is the Percentage of Correctly Classified Instances a sufficient measure in this case? Why? Hint: check the number of good and bad cases in the test sample, using the confusion matrix. Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. For example let us define an experiment from P positive instances and N negative instances. The four outcomes can be formulated in a 2 by 2 contingency table or confusion matrix. One benefit of a confusion matrix is that it is easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another).

· Looking at the decision tree itself, are the rules it applies sensible? Are there any branches which appear absurd? At what depth of the tree? What does this suggest? Hint: Check the rules applied after following the paths: (a) CheckingAccount = <0, Foreign = yes, Duration >11, Job = skilled, OtherDebtors = none, Duration <= 30 and (b) CheckingAccount = <0, Foreign = yes, Duration >11, Job = unskilled.

· How does the decision tree deal with classification in the case where there are zero instances in the training set corresponding to that particular path in the tree (e.g. those leaf nodes that have (0:0))?

Deliverable:

· Your report including the screenshots of your implementation for each section and the results.

<td

Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.

Do you need an answer to this or any other questions?

About Wridemy

We are a professional paper writing website. If you have searched a question and bumped into our website just know you are in the right place to get help in your coursework. We offer HIGH QUALITY & PLAGIARISM FREE Papers.

How It Works

To make an Order you only need to click on “Order Now” and we will direct you to our Order Page. Fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Are there Discounts?

All new clients are eligible for 20% off in their first Order. Our payment method is safe and secure.

Hire a tutor today CLICK HERE to make your first order

@RELATION GermanCredit

@ATTRIBUTE CheckingAccount {<0

<200

>=200

none}

@ATTRIBUTE Duration NUMERIC

@ATTRIBUTE CreditHistory {ok

ok_at_this_bank

ok_til_now

past_delays

critical}

@ATTRIBUTE Purpose {car_new

car_used

furniture

television

appliances

repairs

education

vacation

retraining

business

others}

@ATTRIBUTE CreditAmount NUMERIC

@ATTRIBUTE SavingsAccount {<100

<500

<1000

>=1000

unknown}

@ATTRIBUTE YearsEmployed {unemployed

>=7}

@ATTRIBUTE InstallmentRate NUMERIC

@ATTRIBUTE PersonalStatus {male_divorced

female

male_single

male_married

female_single}

@ATTRIBUTE OtherDebtors {none

guarantor}

@ATTRIBUTE ResidentSince NUMERIC

@ATTRIBUTE Property {real_estate

savings

car

unknown}

@ATTRIBUTE Age NUMERIC

@ATTRIBUTE OtherPlans {bank

stores

none}

@ATTRIBUTE Housing {rent

own

free}

@ATTRIBUTE NumCreditsAtBank NUMERIC

@ATTRIBUTE Job {unemployed

unskilled

skilled

management}

@ATTRIBUTE Dependents NUMERIC

@ATTRIBUTE Telephone {no

yes}

@ATTRIBUTE Foreign {yes

no}

@ATTRIBUTE Approve {good

bad}

@DATA

furniture

-1000000000

unknown

unemployed

-10000

female_single

none

car

-293

none

own

-1000

skilled

-185

yes

good

critical

television

1169

unknown

>=7

male_single

none

real_estate

none

own

skilled

yes

good

<200

ok_til_now

television

5951

<100

female

none

real_estate

none

own

skilled

yes

bad

none

critical

education

2096

<100

male_single

none

real_estate

none

own

unskilled

yes

good

ok_til_now

furniture

7882

<100

male_single

guarantor

savings

none

free

skilled

yes

good

past_delays

car_new

4870

<100

male_single

none

unknown

none

free

skilled

yes

bad

none

ok_til_now

education

9055

unknown

male_single

none

unknown

none

free

unskilled

yes

good

none

ok_til_now

furniture

2835

<1000

>=7

male_single

none

savings

none

own

skilled

yes

good

<200

ok_til_now

car_used

6948

<100

male_single

none

car

none

rent

management

yes

good

none

ok_til_now

television

3059

>=1000

male_divorced

none

real_estate

none

own

unskilled

yes

good

<200

critical

car_new

5234

<100

unemployed

male_married

none

car

none

own

management

yes

bad

<200

ok_til_now

car_new

1295

<100

female

none

car

none

rent

skilled

yes

bad

ok_til_now

business

4308

<100

female

none

savings

none

rent

skilled

yes

bad

<200

ok_til_now

television

1567

<100

female

none

car

none

own

skilled

yes

good

critical

car_new

1199

<100

>=7

male_single

none

car

none

own

unskilled

yes

bad

ok_til_now

car_new

1403

<100

female

none

car

none

rent

skilled

yes

good

ok_til_now

television

1282

<500

female

none

car

none

own

unskilled

yes

bad

none

critical

television

2424

unknown

>=7

male_single

none

savings

none

own

skilled

yes

good

business

8072

unknown

male_single

none

car

bank

own

skilled

yes

good

<200

ok_til_now

car_used

12579

<100

>=7

female

none

unknown

none

free

management

yes

bad

none

ok_til_now

television

3430

<1000

>=7

male_single

none

car

none

own

skilled

yes

good

none

critical

car_new

2134

<100

male_single

none

car

none

own

skilled

yes

good

ok_til_now

television

2647

<1000

male_single

none

real_estate

none

rent

skilled

yes

good

critical

car_new

2241

<100

male_single

none

real_estate

none

rent

unskilled

good

<200

critical

car_used

1804

<500

male_single

none

savings

none

own

skilled

yes

good

none

critical

furniture

2069

unknown

male_married

none

car

none

own

skilled

good

ok_til_now

furniture

1374

<100

male_single

none

real_estate

bank

own

unskilled

yes

good

none

television

426

<100

>=7

male_married

none

car

none

own

unskilled

yes

good

>=200

ok_at_this_bank

television

409

>=1000

female

none

real_estate

none

rent

skilled

yes

good

<200

ok_til_now

television

2415

<100

male_single

guarantor

real_estate

none

own

skilled

yes

good

past_delays

business

6836

<100

>=7

male_single

none

unknown

none

own

skilled

yes

bad

<200

ok_til_now

business

1913

>=1000

male_married

none

real_estate

bank

own

skilled

yes

good

ok_til_now

furniture

4020

<100

male_single

none

car

stores

own

skilled

yes

good

<200

ok_til_now

car_new

5866

<500

male_single

none

car

none

own

skilled

yes

good

none

critical

business

1264

unknown

>=7

male_single

none

unknown

none

rent

unskilled

yes

good

>=200

ok_til_now

furniture

1474

<100

female

none

savings

bank

own

management

yes

good

<200

critical

television

4746

<100

male_single

none

savings

none

own

unskilled

yes

bad

none

critical

education

6110

<100

male_single

none

unknown

bank

free

skilled

yes

good

>=200

ok_til_now

television

2100

<100

male_single

real_estate

stores

own

skilled

yes

bad

>=200

ok_til_now

appliances

1225

<100

male_single

none

car

none

own

skilled

yes

good

<200

ok_til_now

television

458

<100

male_single

none

real_estate

none

own

skilled

yes

good

none

ok_til_now

television

2333

<1000

>=7

male_single

none

car

bank

own

management

yes

good

<200

ok_til_now

television

1158

<1000

male_divorced

none

car

none

own

skilled

yes

good

<200

past_delays

repairs

6204

<100

male_single

none

real_estate

none

own

unskilled

yes

good

critical

car_used

6187

<500

male_married

none

car

none

rent

skilled

yes

good

critical

car_used

6143

<100

>=7

female

none

unknown

stores

free

unskilled

yes

bad

none

critical

car_new

1393

<100

female

none

car

none

own

management

yes

good

none

ok_til_now

television

2299

<1000

>=7

male_single

none

car

none

own

skilled

yes

good

ok_til_now

car_used

1352

<1000

unemployed

female

none

savings

none

rent

unemployed

yes

good

none

critical

car_new

7228

<100

male_single

none

savings

none

own

unskilled

yes

good

none

ok_til_now

television

2073

<500

female

real_estate

none

own

skilled

yes

good

<200

past_delays

furniture

2333

unknown

male_single

none

savings

bank

own

unskilled

yes

good

<200

past_delays

car_used

5965

<100

>=7

male_single

none

car

none

own

management

yes

good

none

ok_til_now

television

1262

<100

male_single

none

car

none

own

skilled

yes

good

none

ok_til_now

car_used

3378

unknown

male_single

none

savings

none

own

skilled

yes

good

<200

past_delays

car_new

2225

<100

>=7

male_single

none

unknown

bank

free

skilled

yes

bad

none

ok_at_this_bank

car_new

783

unknown

male_single

guarantor

real_estate

stores

own

unskilled

yes

good

<200

ok_til_now

television

6468

unknown

unemployed

male_single

none

unknown

none

own

management

yes

bad

none

critical

television

9566

<100

female

none

car

stores

own

skilled

yes

good

>=200

ok_til_now

car_new

1961

<100

>=7

female

none

car

none

own

management

yes

good

critical

furniture

6229

<100

female

unknown

none

rent

unskilled

yes

bad

<200

ok_til_now

business

1391

<100

male_married

none

real_estate

bank

own

skilled

yes

good

<200

critical

television

1537

unknown

>=7

male_single

guarantor

real_estate

none

own

skilled

yes

good

<200

business

1953

<100

>=7

male_single

none

unknown

none

free

management

yes

bad

<200

business

14421

<100

male_single

none

car

none

own

skilled

yes

bad

none

ok_til_now

television

3181

<100

female

none

savings

none

own

skilled

yes

good

none

ok_til_now

repairs

5190

unknown

>=7

male_single

none

savings

none

own

skilled

yes

good

none

ok_til_now

television

2171

<100

female

none

car

bank

own

skilled

yes

good

<200

ok_til_now

car_new

1007

>=1000

male_married

none

real_estate

none

own

skilled

yes

good

none

ok_til_now

education

1819

<100

male_single

none

unknown

stores

free

skilled

yes

bad

none

ok_til_now

television

2394

unknown

female

none

car

none

own

skilled

yes

good

none

ok_til_now

car_used

8133

<100

female

none

savings

bank

own

skilled

yes

good

none

critical

television

730

unknown

>=7

male_single

none

savings

none

rent

unskilled

yes

good

critical

others

1164

<100

>=7

male_single

none

unknown

bank

free

management

yes

good

<200

critical

business

5954

<100

female

none

real_estate

bank

own

unskilled

yes

good

ok_til_now

education

1977

30 Aug DECISION TREES for Risk Assessment One of the great advantages of decision trees is their interpretability. The rules learnt for classification are easy for a person to follow, unli

About Wridemy

We are a professional paper writing website. If you have searched a question and bumped into our website just know you are in the right place to get help in your coursework. We offer HIGH QUALITY & PLAGIARISM FREE Papers.

How It Works

Are there Discounts?

Hire a tutor today CLICK HERE to make your first order

Related Tags

About us

Quick help

Subjects covered