Trending Hashtags Analyzer

Data Structures Algorithms Streaming

You are given a stream of tweet logs where each tweet is represented as a record containing two fields:

  • timestamp: An integer representing the time (in seconds) when the tweet was posted.
  • text: A string that may include one or more hashtags (words prefixed with the # symbol).

Your task is to design and implement a function getTrendingHashtags(tweets, currentTime, windowSize, k) that returns the top k trending hashtags within a rolling time window. The rolling window includes all tweets with a timestamp in the range [currentTime - windowSize, currentTime].

Requirements:

  1. Parse each tweet's text and extract hashtags. Hashtags are case-insensitive, i.e., #Hello is equivalent to #hello.

  2. Count the frequency of each hashtag in the current rolling window.

  3. Return the top k hashtags sorted in descending order of frequency. If two hashtags have the same frequency, you can order them arbitrarily.

  4. Consider efficiency in your implementation since the tweet stream can be large.

You can assume the input tweets is provided as an array (or list) of tweet records in no particular order. Your solution should effectively filter the tweets that fall within the specified time window based on the given currentTime and windowSize.

Write clean, well-documented code and include examples or tests demonstrating your implementation.