Concept Hierarchy

Why do I care?

In data we can visualize relationships in a hierarchical structure. As scientists our goal is to not only understand the data, but more importantly we want to be able to visualize the connections between pieces of information. Manual definition of concept hierarchies can be a tedious and time-consuming task for a user or a domain expert. Fortunately, many hierarchies are implicit within the database schema and can be automatically defined at the schema definition level. The concept hierarchies can be used to transform the data into multiple levels of granularity. For example, data mining patterns regarding sales may be found relating to specific regions or countries, in addition to individual branch locations.

What is a concept?

A group of records that have been assigned a label.

What is a concept hierarchy?

Means generating a hierarchical order among concepts.

Let’s get deeper

1- Ordering of the attributes of the schema level by user or expert.

Let us assume that a set of the following attributes are given:

School, Department, college, school and an expert defines the hierarchy as follows:

Department-> College -> School

This means we have 3 attributes for the above three concepts, and we want to automatically generate hierarchy among the three attributes.

2- Ordering by adding hierarchy within a footstep

For example, college could be further divided into:

Science-oriented
Health-oriented

Humanity-oriented

3- Ordering by set grouping or value grouping

Attribute values for an attribute age could go with the hierarchy among the set of groups:

{20–39}, {40–59}, {60–82}

4- Ordering by decoding operation data in an attribute is a set of emails:

dmbrook@cs.sfu.ca

Concept hierarchies can be created by separating different components of the email data.

Dmbrook->cs->sfu->ca

Concept hierarchies can be created by separating different components of the email data.

5- Ordering by data clustering and data distribution analysis

6- Ordering by use of rules

Hierarchy among the values of an attribute profit (for items (x)) considering the price, cost, and threshold for the profit can be found by the following set of rules

How do I do it?

Suppose a user selects a set of location-oriented attributes — street, country, province

or state, and city — from the AllElectronics database, but does not specify the hierarchical

ordering among the attributes.

First, sort the attributes in ascending order based on the number of distinct values in each attribute. This results in the following (where the number of distinct values per attribute is shown in parentheses):

country (15),

province or state (365),

city (3567),

and street (674,339).

Second, generate the hierarchy from the top down according to the sorted order, with the first attribute at the top level and the last attribute at the bottom level. Finally, the user can examine the generated hierarchy, and when necessary, modify it to reflect desired semantic relationships among the attributes. In this example, it is obvious that there is no need to modify the generated hierarchy.

How do I do it?

Suppose a user selects a set of location-oriented attributes — street, country, province

or state, and city — from the AllElectronics database, but does not specify the hierarchical

ordering among the attributes.

First, sort the attributes in ascending order based on the number of distinct values in each attribute. This results in the following (where the number of distinct values per attribute is shown in parentheses):

country (15),

province or state (365),

city (3567),

and street (674,339).

Second, generate the hierarchy from the top down according to the sorted order, with the first attribute at the top level and the last attribute at the bottom level. Finally, the user can examine the generated hierarchy, and when necessary, modify it to reflect desired semantic relationships among the attributes. In this example, it is obvious that there is no need to modify the generated hierarchy.

FAUN.dev() is where engineers from GitHub, Netflix, and Shopify go to stay ahead — fast.

Concept Hierarchy

Let's keep in touch!

Give a Pawfive to this post!

Start writing about what excites you in tech — connect with developers, grow your voice, and get rewarded.

FAUN.dev() is where engineers from GitHub, Netflix, and Shopify go to stay ahead — fast.

Evan Gertis

Developer Influence

62

6k

1

You may also like ..

Top 25 Distributed Databases

My Machine Learning Roadmap

Real-Time Messaging Spam Detection With Machine Learning in Python

Stock Market Analysis Using Python Pandas

Quantum Artificial Intelligence