Bigdata based – Punghee Cho

[Project 1]
Shopping application with the Bigdata based recommender system for IOS, Android

I developed the react-native based application that includes order, payment, inventory process and bigdata based recommender system. :

Frontend ,

Backend ,

Analysis using spark(it computes weights for features and gets recommended item list for each product.) (To see the source code, please download this HTML file on your computer).

For the collaborative filtering algorithm, I improved cosine similarity formula while studying the recommendation system and referencing some thesis with ph.D. students and prof.Lawrence at Laboratory of the University of Texas at Dallas.

Here is the link of the formula of improved collaborative filtering algorithm that I wrote : https://drive.google.com/file/d/1JgqiBUwfwk_PkaAOfpAHwsjYYGws7sAp/view?usp=sharing

The presentation material that I made one of presenters of 2021 Fall S²ERC Showcase

In summary, instead of just using cosine similarity algorithm as collaborative filtering algorithm which is the following formula,

I created the improved collaborative filtering algorithm using some features(brand, season) and genetic algorithm as the followed formula.

For more detailed information, please refer to this link : https://drive.google.com/file/d/1JgqiBUwfwk_PkaAOfpAHwsjYYGws7sAp/view?usp=sharing

For the Recommended item list(Analysis page), you can check at 2minute 40second

Skills
Spark(pyspark), MongoDB, React-native, NodeJS, ExpressJS

[Project 2]
Spark Streaming with Twitter and Kafka

I created a Spark Streaming application that will continuously read data from Twitter, analyze them for their sentiment, and send the values to Apache Kafka. A pipeline using Elasticsearch and Kibana will read the data from Kafka and analyze it visually. :

LinkWithConfluent(It links with Kafka of Confluent’s cluster and checks whether the streaming works fine or not.) (To see the source code, please download this HTML file on your computer). ,
LinkWithTwitterAndAnalyzerAndProducerToKafka(It links with Twitter API, analyzes the API, predicts sentiment of each text in real-time, and produces messages to the topic of Kafka.) (To see the source code, please download this HTML file on your computer).

For this, I followed below steps.

<topic of Kafka on Confluent> – the “streaming_test_8” topic gets messages from twitter api in real time.

<ElasticSearchSinkConnector on Confluent> – it makes the kafka link with elastic cloud for visualization.

<stream lineage of Confluent> – it visually shows streaming of data from producers, to downstream topics and consumers.

<Kibana graphical plot – data came from the “streaming_test_8” topic of Confluent Kafka>

The above graphical plot shows average “sentiment” with time. I used “covid” as search term. I set the sentiment value as “-1:negative, 0:central, 1:positive”.

In the date set, for example, “2021111103” means year:2021, month:11, day:11, hour:03.

From “2021111103” to “2021111104”, the average of sentiment value was 0.18.

From “2021111104” to “2021111105”, the average of sentiment value was 0.22.

From “2021111105” to “2021111106”, the average of sentiment value was 0.21.

From “2021111106” to “2021111107”, the average of sentiment value was 0.14.

Generally, all sentiment’s value of each date is positive value. And, the average value of covid is also positive. As I analyze the search term, these days, people don’t tend to consider “covid” seriously compared to the past.

Skills
Spark(pyspark), Kafka, Elasticsearch, Kibana