Quantcast
Channel: Searching documents at scale: how to maintain cleaned documents? - Data Science Stack Exchange
Browsing all 2 articles
Browse latest View live

Answer by CalZ for Searching documents at scale: how to maintain cleaned...

Think of this problem as a pipeline of steps to automate and re-run, and not just the ML step at the end: Read in raw documents. Stem words, remove stop words. Perform TF-IDF Train model on cleaned up...

View Article



Searching documents at scale: how to maintain cleaned documents?

I have a document-store database (MarkLogic) with hundreds of thousands of news articles in raw format. I am building a content recommender on a representative subset of that data on my local machine....

View Article
Browsing all 2 articles
Browse latest View live




Latest Images