Comments on: Python from scratch- Sort dictionaries and files https://go.hodspot.com/python-from-scratch-sort-dictionaries-and-files/ Personal blog about technology Sat, 01 Jun 2019 22:17:54 +0000 hourly 1 https://wordpress.org/?v=6.8.2 By: Anonymous https://go.hodspot.com/python-from-scratch-sort-dictionaries-and-files/#comment-30 Tue, 28 Feb 2012 09:02:57 +0000 #comment-30 # Split by space
for word in data:
line_data = data.split (" ");

This loop is going over every CHARACTER in "data" (which is the entire file) – that's what happens when you iterate over a string. It doesn't help much because what you are doing inside the loop is running the (relatively heavy) split() over and over again on the same data.

This is what's causing your performance issues I believe.

— Arik

]]>
By: Hod https://go.hodspot.com/python-from-scratch-sort-dictionaries-and-files/#comment-31 Tue, 21 Feb 2012 03:20:30 +0000 #comment-31 Very good.
Thanks.

]]>
By: Hod https://go.hodspot.com/python-from-scratch-sort-dictionaries-and-files/#comment-32 Tue, 21 Feb 2012 03:14:50 +0000 #comment-32 file1 is actually filename in my code.
So the error I guess not relates to file1 being a string since it is a string.

I do understand that my code has performance issues with large input i just don't know how to re-code it to handle it better.

]]>
By: Anonymous https://go.hodspot.com/python-from-scratch-sort-dictionaries-and-files/#comment-33 Tue, 21 Feb 2012 03:12:57 +0000 #comment-33 What the other anonymous commenter said. Also, yeah, I should have figured that the formatting would have been borked. Please try to imagine some indentation as appropriate.

]]>
By: Anonymous https://go.hodspot.com/python-from-scratch-sort-dictionaries-and-files/#comment-34 Tue, 21 Feb 2012 02:56:27 +0000 #comment-34 You have to do data.split() not data.split(" "). Split with no arguments works significantly differently from split with an argument. It removes all whitespace characters and it also gets rid of empty sequences.

]]>
By: Anonymous https://go.hodspot.com/python-from-scratch-sort-dictionaries-and-files/#comment-35 Tue, 21 Feb 2012 02:54:19 +0000 #comment-35 The error message means file1 is a file object, but it should be a filename, aka a string.

]]>
By: Hod https://go.hodspot.com/python-from-scratch-sort-dictionaries-and-files/#comment-36 Mon, 20 Feb 2012 20:08:47 +0000 #comment-36 I tried what you suggested and it didn't help- the ' n ' appeared.

]]>
By: Hod https://go.hodspot.com/python-from-scratch-sort-dictionaries-and-files/#comment-37 Mon, 20 Feb 2012 20:04:08 +0000 #comment-37 I modified the code following your suggestion and got error-

"
for line in open(file1,'r'):
TypeError: coercing to Unicode: need string or buffer, file found
"

Have no clue how to solve it.
Anyone for rescue?

]]>
By: Anonymous https://go.hodspot.com/python-from-scratch-sort-dictionaries-and-files/#comment-38 Mon, 20 Feb 2012 18:40:34 +0000 #comment-38 FYI there is no need to remove the new line character. Split works on whitespaces characters.

]]>
By: Anonymous https://go.hodspot.com/python-from-scratch-sort-dictionaries-and-files/#comment-39 Mon, 20 Feb 2012 17:52:29 +0000 #comment-39 not knowing how large the input file might be I might try something more like this:

file_to_dict(filename):
counts = {}
for line in open(filename,'r'):
for word in line.split(' '):
counts[word] = counts.get(word,0)+1
return counts

]]>