Answer by CalZ for Searching documents at scale: how to maintain cleaned...
Think of this problem as a pipeline of steps to automate and re-run, and not just the ML step at the end: Read in raw documents. Stem words, remove stop words. Perform TF-IDF Train model on cleaned up...
View ArticleSearching documents at scale: how to maintain cleaned documents?
I have a document-store database (MarkLogic) with hundreds of thousands of news articles in raw format. I am building a content recommender on a representative subset of that data on my local machine....
View Article
More Pages to Explore .....