Driss Namly, Karim Bouzoubaa, Abdellah Yousfi


Stop words are defined as words that frequently appear in texts without carrying any significant information. For the Arabic language, existing works suffer from two main drawbacks (i) the use of only proprietary corpus and (ii) the reliance of only the frequency metric. Our approach for automatic Arabic stop-words detection uses a new metric based on a supervised machine learning process and a vector space representation that can be applied to any corpus, taking into account both domain-independent and domain-dependent stop-words. Conducted experiments to evaluate the proposed approach show a significant improvement reaching 91.85% for the detection rate using the F-measure metric.


NLP, Stop-words, Supervised machine learning, Arabic language

Full Text:



