Menu

Implement Efficient Algorithm Calculate Word Frequencies Text File Using Hash Table Hash T Q43798308

Implement an efficient algorithm to calculate the wordfrequencies of a text file using a hash table. The hash table isuseful for storing the word and the frequency meter. The procedureto follow is as follows:

When launching your application, the program should dynamicallycreate a constant-size hash table. Choose what size it should be(it should be prime number and relatively large because the purposeis to store words from large files).
It will then read a text file named ‘data.txt’ containing all thetext for which we want to calculate the frequency of the words.Choose your own way of reading the text file
and you will separate the words. Note that reading words from thetext may require some editing of the word to remove the last symbolif there is a special character (“.”, “!”, “,” Etc.).
▪ Each time a word is read from the file it should be added to thehash table using the word k as the key. To convert it
key from string to integer you can produce a sum of the ascii valuecorresponding to each character. Alternatively you could use analgorithm of your choice. For the hash function use h (k) = k mod mwhere m is the size of the hash table.
▪ Conflicts: You need to implement 2 different versions of theprogram. In the first version the conflicts will be resolved bydouble fragmentation and in the 2nd
version will be resolved with single link lists.
1. Double Fragmentation Conflict Resolution: This technique uses asecond transformation function to identify the next availableposition. That is, let us assume that the function h1 (k) = kmodmis used to find the original storage location. In the event of acollision
another function is used to give the distance to positions from theoriginal position. In the event of a new collision, a test shall becarried out at a position equal to that of the second collision,and so on. A common example that you should use forre-fragmentation is:
h2 (k) = (k / m) mod m ,That is, in this second function, the keyof the key is first divided by the length of the table, then theremainder is calculated and added to the result of h1 (k) to obtainthe next position search.
If the calculated quotient is equal to 0, then it is equal to1.
2. Uni-List Conflict Resolution: Every new key added to a list dueto a conflict with previous keys should be placed in theappropriate list to keep it sorted and be faster.
searching for her.
If the word is already stored in the hash table then it shouldsimply increase the word’s frequency counter by one.
Expand your code to:
▪ Count in both versions how many collisions occur when importingdata into the hash table.
▪ Calculate what is the total time required to save a word file tothe hash table. Time will be measured for both implementations. Tryrunning your code for different text file sizes (you can downloadlarger files online). Show in your code comments the times youcalculated and the corresponding file size.
You can measure the execution time of a code segment asfollows:
All the basic functions of the application should be implementedusing functions. Functions that must necessarily be implementedare:
GetKey (): Returns an integer that corresponds to the string itaccepts as
parameter.
Hash1 (): Implementation of the basic hash function.
Hash2 (): Implementation of the 2nd hash function. Used in theart
of double fragmentation.
Insert (): Inserts a string (each word in the text of thefile)
in the hash structure. It first calculates an integer key bycalling getKey () and then calculates the location corresponding tothe hash () key. Resolves conflicts if any.
Print (): Displays the stored data (Word and display frequency) foreach location of the hash table. It is advisable to call thisfunction to display the table as it is configured after reading thefile.
PrintUnique (): Displays all words in the text that are unique. Thedisplay data should be read from the configured hash table.

The program has to be in programming laguage C, thanks inadvance for your time.

Expert Answer


Answer to Implement an efficient algorithm to calculate the word frequencies of a text file using a hash table. The hash table is …

OR