Just to add to Brian's comments. I don't think there should be any issuer with 170K words. I have worked on datasets with millions of words and it should work just fine. Supporting sparse data in all our algorithms is one of our fundamental requirements.
Here are some resources to get you going:
Text analytics 101 with GLC:
A notebook on sentiment classifiers:
Regression analysis on yelp with text data:
You should have no issues working with sparse data. Let us know if you need more pointers.