• Active Learning Activation Tool

    Intro

    Annotation is an important process to generate data by manually give label to the feature in the text . Brat is one of them, which is an sophiscated web-based software for annotation on webpage for large batch of documents.

    Which I did is to extend the software in many acpects to support different function.


    NLP

    What I did

    I extended the tool to support the following functions:

    1. learns text classifiers from weak supervision (such as providing keywords of a category).
    2. adaptively chooses documents to asks for labels (via active learning).
    3. explains model predictions by highlighting informative spans of text.

    The goal of is to save user’s labeling efforts in training text classifiers from scratch and to make the underlying ML model a white-box instead of a black-box to facilitate further supervision and a sense of trust from users.

  • Emoji2Vec Visualization

    Intro

    The Emoji Cloud is a visualization of the semantic space of emojis, based on a recent paper at ICWSM. Embeddings of emojis are trained using LINE and the 3D layout is learned using LargeVis.

    Word2vec is a group of related models that are used to produce word embeddings. Emoji2vec has the smilar idea, which use deep neural network to train the model and get the middle layer as the representation of word vector.

    Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space. As emoji can be treated as word too, the model can be transformed to form the embedding of emoji.


    NLP

    Emoji Cloud

    Since that emoji vector after trained have very high dimension,s the embedding needs to be reduced dimension, in which I used LargeVis, which is a method that use the random projection trees and other techniques to find the approximate representation of high dimensional vectors in low dimension. Here I set the dimension to 3.

    The 3D-model visulization is created by Three.js. Which supports interaction and searching function. The model can be seen and played here.


    NLP
  • SAA Official Website

    Intro

    SAA is the chinese student union for Shanghai Jiaotong University Joint Institude and the University of Michigan. The SAA official website records the activity and life of the students in the joint institude and enable students to post message and new onto the website.


    NLP

    Current process:

    The skeleton and most of the pages are finished locally, the website is mainly written and maintained in Meteor/ React.js/ Node.js.

    The project is written together with Zihan Li, The code can seen on my github Gaole Meng

  • Word Disambiguration by Active Learning

    Intro

    Word Disambiguration

    Word disambiguration is a fun classification problem in natrual language processing, which enables the machine to distinguish the different meaning of the same word given the corpus of the word.

    Active Learning

    Active learning is an interesting division in machine learning, which enable the machine to require the data most uncertainty to be labeled, the uncertainty is measured by entropy.


    NLP

    Experiment Results:

    I compire the active learning accuracy with the traditional uncertainty sampling which introduce the


    NLP
  • IPA Platform

    Intro

    IPA platform is a crowdsource based online platform for people to hire crowd worker to do passive jobs.

    alt text

    What is passive task?

    By saying passive task, we must mention it’s opposite – active task. Active task is the kind of jobs that needs constant huamn efforts, like “controlling the robot to fetch a cup of water”. Which needs constant human efforts to control the robot. By contrast, passive jobs are the jobs that require little work, but need constant absorbing the information from the environment, like “make some noise when Peter comes by”.

    In the latest robot crowdsourcing platform, many jobs are related to passive task, which means they require less efforts than it seems. We plan to create a platform for people combine the active task and passive jobs together at the same time.

    What is IPA - Platform?

    IPA-platform stands for the intelligent personal assistant, which is the platform that provides people with the crowdsourcing platform to do multiple passive jobs.

    Since passive jobs require little computation power, say, human mind, so decide to design a multi-streaming system that enables multiple streaming at the same time, which means the people can do several passive jobs together.

    What have we done?

    We have done the prototype of this platform which includes:

    Online simulation platform:

    1. An online simulation platform for people to trying control virtual robot in a simulation environment before touching the real robot.
    2. We have realize the ability to add different obstacle to the environment to simulate the true situation.

    Multiple Streaming Platform:

    This platform enables people to play the streaming in different speed and checkout different streaming at the same time.

  • COOL Compilers

    Intro

    Cool, an acronym for Classroom Object Oriented Language, is a computer programming language designed by Alexander Aiken for use in an undergraduate compiler course project. While small enough for a one term project, COOL still has many of the features of modern programming languages, including objects, automatic memory management, strong static typing and simple reflection.

    alt text

    What is COOL compiler?

    COOL compiler is the compiler for COOL which includes lexical analysis, parser, semantic anslysis, code geration and reigister allocation.