Count frequency of characters in a given file in python
CoderLegion
Posted on June 6, 2021
In Python, the collections module used as container datatypes. By providing alternatives to Python’s general purpose built-in containers, dict, list, set, and tuple, it implements specialized container datatypes. This module has a dict subclass for counting hashable objects which is known as counter. With no restrictions on its keys and values, the counter class itself considered as a dictionary subclass. The values are intended to be numbers representing counts, but you'll store anything within the value field. Including zero or negative counts, this counts are allowed to be any integer value. The Counter class is analogous to bags or multisets in other languages.
Count frequency of characters in a given file
There is no got to split words, at all; directly passing a string to the counter updates the counts per character. Use the flag “r” for input file and the flag “w” for output file. You furthermore may got to collect all counts first, and only then write them bent the output file:
from collections import Counter
def count_letters(in_filename, out_filename):
counts = Counter()
with open(in_filename, "r") as in_file:
for chunk in iter(lambda: in_file.read(8196), ''):
counts.update(chunk)
with open(out_filename, "w") as out_file:
for letter, count in counts.iteritems():
out_file.write('{}:{}\n'.format(letter, count)```
It should be note that, the inputfile is processed in 8kb chunks instead of in one go; you'll adjust the block size (preferably in powers of 2) to maximise throughput.
If you would like your output file to be sorted by frequency (descending), then you could also use .most_common() rather than .iteritems().
Posted on June 6, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024