I developed the react-native based application that includes order, payment, inventory process and bigdata based recommender system. :
Analysis using spark(it computes weights for features and gets recommended item list for each product.) (To see the source code, please download this HTML file on your computer).
For the collaborative filtering algorithm, I improved cosine similarity formula while studying the recommendation system and referencing some thesis with ph.D. students and prof.Lawrence at Laboratory of the University of Texas at Dallas.
Here is the link of the formula of improved collaborative filtering algorithm that I wrote : https://drive.google.com/file/d/1JgqiBUwfwk_PkaAOfpAHwsjYYGws7sAp/view?usp=sharing
In summary, instead of just using cosine similarity algorithm as collaborative filtering algorithm which is the following formula,
I created the improved collaborative filtering algorithm using some features(brand, season) and genetic algorithm as the followed formula.
For more detailed information, please refer to this link : https://drive.google.com/file/d/1JgqiBUwfwk_PkaAOfpAHwsjYYGws7sAp/view?usp=sharing
Skills
Spark(pyspark), MongoDB, React-native, NodeJS, ExpressJS
I created a Spark Streaming application that will continuously read data from Twitter, analyze them for their sentiment, and send the values to Apache Kafka. A pipeline using Elasticsearch and Kibana will read the data from Kafka and analyze it visually. :
LinkWithConfluent(It links with Kafka of Confluent’s cluster and checks whether the streaming works fine or not.) (To see the source code, please download this HTML file on your computer). ,
LinkWithTwitterAndAnalyzerAndProducerToKafka(It links with Twitter API, analyzes the API, predicts sentiment of each text in real-time, and produces messages to the topic of Kafka.) (To see the source code, please download this HTML file on your computer).
For this, I followed below steps.
The above graphical plot shows average “sentiment” with time. I used “covid” as search term. I set the sentiment value as “-1:negative, 0:central, 1:positive”.
In the date set, for example, “2021111103” means year:2021, month:11, day:11, hour:03.
From “2021111103” to “2021111104”, the average of sentiment value was 0.18.
From “2021111104” to “2021111105”, the average of sentiment value was 0.22.
From “2021111105” to “2021111106”, the average of sentiment value was 0.21.
From “2021111106” to “2021111107”, the average of sentiment value was 0.14.
Generally, all sentiment’s value of each date is positive value. And, the average value of covid is also positive. As I analyze the search term, these days, people don’t tend to consider “covid” seriously compared to the past.
Skills
Spark(pyspark), Kafka, Elasticsearch, Kibana