Bookmarked

Dustin Miller explains his process for using technology to help analyse the redacted Mueller Report. This involves:

  • four optical character recognition libraries to make the text searchable.
  • named entity recognition and parts of speech to establish co-references.
  • n-grams to identify dates, people, locations and correlations.

This provides an insight into the “messy, dirty truth of data science” and machine learning. You can find more information here.