Search This Blog

Wednesday, August 19, 2009

What is "Pre processing" ?

"Preprocessed" is a data cleaning process to be more easily processed by the system. Preprocessed is useful in order to extract the data before and taken his knowledge or keyword does not already have dirty data or data that are not needed. Example of dirty data such as:
  • Digit
  • Punctuation
  • Bullets

Based on the research I do, numbers, punctuation marks, and bullets do not have a significant influence on a text data. This is because data in the text will eventually be in the ranking while the data is dirty data can not stand on its own.

Because it is the numbers, punctuation marks, and bullets need to be erased before the process of text mining that others do.

this is the scheme of prepossessing

2 comments:

Jack said...

his article covers so many new and unique facts about preprocessing which I wasn't aware of. I am glad that I found such a useful post and its my pleasure to give comment.
e signatures

Icank said...

thank you for your comment,,,

Calendar