Please note that, after writing this post, NudeDetector, NudeClassifier changed a lot. Major changes are

  1. 10x data, more parts detection (Checkout https://github.com/notAI-tech/NudeNet/)
  2. Auto downloading the checkpoint files and windows support.
  3. 2x faster default model and 6x faster “fast” detection mode.
  4. Support for video detection using smart frame selection.
  5. Tf Lite support for NudeCLassifier

Part 1: Nudity detection with image classification

With the advent of excellent DL libraries and plethora of open-source implementations and papers, Image Classification is very easy to implement. That is, if you have the dataset. There are plenty of open datasets available to test and refine classification models. But obtaining a task specific dataset is tough. …


Please note that, after writing this post, NudeDetector changed a lot. Major changes are

  1. 10x data, more parts detection (Checkout https://github.com/notAI-tech/NudeNet/)
  2. Auto downloading the checkpoint files and windows support.
  3. 2x faster default model and 6x faster “fast” detection mode.
  4. Support for video detection using smart frame selection.

Part 2: Exposed part detection and censoring.


My first post (https://github.com/bedapudi6788/deepsegment) tackles the problem of sentence segmentation with bad, no punctuation. Although the absolute accuracy reported in the post might seem lower, the model itself performs excellently in real world (as explained in the update to the original post).

While exploring vector alignment, I realised that for some combinations of languages aligned vectors can be used for building multilingual models. Before going into the results of the multilingual DeepSegment, I am going to briefly touch upon what vector alignment is and it’s advantages.

Vector alignment is a very simple but effective concept. When we train some word vector model (eg: FastText, Word2vec, glove) on a corpus the vector representations of the words that we get are representations of semantic similarity of words in that corpus. But, when we train FastText or Glove on two different corpora, the vector representation that we obtain won’t translate.


Spell correction using seq2seq models is nothing new. Tal Weiss tried to tackle spell correction by training a seq2seq model on data generated from Google news corpus. His original implementation though it doesn’t work, inspired many to tackle spell correction with seq2seq. There were some unsuccessful attempts by others (Matthew Relich, Pavel Surmenok) at this.

Unfazed by this I decided to try and build a working spell corrector with deep learning. Initially I identified the major problems with the above implementations.

  1. Use of attention: Attention is very important for any sequence to sequence task. Without attention, the network needs to look at the entire input while making predictions. So, I built a seq2seq model with LSTM encoder, decoders and “Luong” attention. I used the same model for punctuation correction before and achieved excellent results in my previous post. …


Demo available here.

While I was testing the ASR systems DeepSpeech and kaldi as a part of the deep learning team at Reckonsys, I realised that neither of them supports auto punctuation. This is of-course expected, since including punctuation symbols while training, will increase the total number of decoding tokens and will result in lower accuracy.

Although, punctuation isn’t really necessary in most of the use-cases like Sentiment Analysis or NER (punctuation helps, but isn’t essential), it is of utmost importance for transcription services. Imagine sending an email to your client with no punctuation and capitalisation. So, we tested the big guy’s (google) cloud speech api and it indeed offers an Auto Punctuation option. …


Sentence segmentation or Sentence boundary Detection is one of the foremost problems of NLP that is considered to be solved. While working on a text correction module that includes punctuation correction, spell correction and common grammar error correction, I realised that to do any of these my model(s) should be able to correctly segment the input text.

There are various libraries including some of the most popular ones like NLTK, Spacy, Stanford CoreNLP that that provide excellent, easy to use functions for sentence segmentation. Before you start thinking that this is just another survey post that adds nothing new to the topic, let’s take a look at how these libraries segment the text “I am Batman. …

About

Praneeth Bedapudi

Senior NLP Engineer - DeepAffects

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store