n my last blog, I have covered Text Analysis and Text Parsing. In this blog I am going to cover Text Analystics module i.e. SAS Content Categorization Studio. By using SAS Content Categorization Studio we can categorize the unstructured text data. This process is known as building taxonomy.
By building Taxonomy we can identify major categories where the customer is trying to focus.
For example, if there is water problem the SAS Content Categorization Studio helps to categorize all problems related to water so that the customer can easily find the major areas to focus.
Below is the SAS Content Categorization icon:
Figure 1: Content Categorization Icon
Double click on icon to open the tool. For creating a new project click on File>>New Project, if we want to edit existing project by clicking on File>>Open Project.
Figure 2: Creating New Project
After clicking New Project assign a project name and project location path.
The tool is user friendly and hence user can fill required information as shown in the screen shots below:
Figure 3: Project Name
The process for creating new project:
New Project>>Project Name>>Project language>>right click on Top to add Category and define category name.
In the below screen we are defining a category name:
Figure 4: Category Name
Clicking on Rules we can add rules by using Boolean operators for particular category. We can select Text View and can add Boolean operator manually and by selecting Tree View we can add Boolean operator by right clicking.
Figure 5: Rules Updation
For testing the rules we have made we need to do below activity.
Click on Document and paste the content then click on Test as shown below:
Figure 6: Document Testing
Click on Test to see the tagged categories, if it is not tagging correctly then we have to enrich the category rules. This is the repetitive process.
Figure 7: Tagged Category
For testing Categories Boolean rules, go to Build and select Build Rule based Categorizer, If it shows Build Successful at the left bottom without any error then we can upload our Taxonomy project into server.
Figure 8: Build Rulebased Categorization
After building successfully go to Build>>Upload Categorizer, then enter Server Host Name, Port No., Username, Password and the Server Project Name which we want to upload on Content Categorization Server.