Thursday, 19 April 2018

Why does text processing in Python seem faster than in C++ in the following two examples? is it an I/O issue?

Hereunder are two short code snippets, one in C++ and one in Python, doing in principle the same simple text processing.



When running both on a large text file (i.e. Shakespeare' integral words as an UTF-8 or ASCII file from Gutenberg.org, it appears that the Python code runs five times faster, under Mac with Python 2.7.10 and g++ from Xcode (2016-11)).



Question: Did I make an ostensible mistake in the C++ test hereunder?




#include 
#include
#include
#include

// C++ example.

std::map M;

void word_frequency(std::string word) {

M[word] = ++M[word];
}

void countwords(std::string fname)
{
std::ifstream fin(fname);
int count=0;
std::string word, line;

while (std::getline(fin, line)) {

std::istringstream iss(line);
while (iss >> word) {
//std::cout << word << std::endl;
word_frequency(word);
}
}
//std::cout << count << std::endl;
}

int main(int argc, char **argv) {


if (argc < 2) {
std::cerr << "Usage: count_words \n";
return EXIT_FAILURE;
}
else {
countwords(argv[1]);
}

for (auto item : M) std::cout << item.first << " " << item.second << std::endl;

}


#
# Python example
#

import sys

def main():

lines = (l for l in file(sys.argv[1]))
m = dict()
for l in lines:
for w in l.split(" "):
if w in m:
m[w] = m[w]+1
else:
m[w] = 1
print m



if __name__ == "__main__":
main()

No comments:

Post a Comment

casting - Why wasn&#39;t Tobey Maguire in The Amazing Spider-Man? - Movies &amp; TV

In the Spider-Man franchise, Tobey Maguire is an outstanding performer as a Spider-Man and also reprised his role in the sequels Spider-Man...