Asked • 04/20/19

Maximizing Code Performance for Shapely?

I've written this code in order to calculate the percent cover of grassland within a national park, within 20km^2 of points of recorded occurrences for a species. It is designed to only consider the areas within the park, and not those outside. It works, except for one huge issue... it's *incredibly* slow. I think I've tracked it down to the `avail_lc_type = pt_buffer.intersection(gl).area`. The gl polygon has > 24,000 polygons in it. It is a raster to polygon transformation (it has been dissolved), so it has a lot of little polygons within. Right now it's not a huge problem since I'm running it on ~300 points (still takes > 1 hr), but I plan to run it on a few million later, so I need to improve it. Any ideas? <!-- language: lang-py --> import numpy as np import pprint import shapely from shapely.geometry import* import fiona from fiona import collection import math traps = fiona.open('some_points.shp', 'r') #point file: focal points around which statistics are being derived study_area = fiona.open('available_areas.shp', 'r') #polygon file: represents area available for analysis for i in study_area: #for every record in 'study_area' sa = shape(i['geometry']) #make a variable called 'sa' that is a polygon grassland = fiona.open('land_cover_type_of_interest.shp', 'r') #polygon file: want to calculate percent cover of this lc type within study_area, and within areaKM2 (next variable) of each focal point pol = grassland.next() gl = MultiPolygon([shape(pol['geometry']) for pol in grassland]) areaKM2 = 20 #hyp home range size of specie of interest with traps as input: #calculate initial area in meters, set radius areaM2 = areaKM2 * 1000000 r = (math.sqrt(areaM2/math.pi)) #begin buffering and calculating available area (i.e. within study area) for each point for point in input: pt_buffer = shape(point['geometry']).buffer(r) avail_area = pt_buffer.intersection(sa).area #check and adjust radius of buffer until it covers desired available area within study area while avail_area < areaM2: r += 300 pt_buffer = shape(point['geometry']).buffer(r) avail_area = pt_buffer.intersection(sa).area #then, calulate percent cover of land cover type of interest within adjusted buffer area #print to check avail_lc_type = pt_buffer.intersection(gl).area perc_cov = (avail_lc_type/areaM2) * 100 print perc_cov @MWrenn - Thanks for the suggestion. I was able to profile my code and came up with this: 55.3555590078 415 function calls (365 primitive calls) in 48.633 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 48.632 48.632 <module1>:15(neighb_func) 1 0.000 0.000 48.633 48.633 <string>:1(<module>) 2 0.000 0.000 0.001 0.000 <string>:531(write) 1 0.000 0.000 0.000 0.000 __init__.py:1118(debug) 1 0.000 0.000 0.000 0.000 __init__.py:1318(getEffectiveLevel) 1 0.000 0.000 0.000 0.000 __init__.py:1332(isEnabledFor) 1 0.000 0.000 0.000 0.000 _abcoll.py:483(update) 3 0.000 0.000 0.000 0.000 _weakrefset.py:68(__contains__) 1 0.000 0.000 0.000 0.000 abc.py:128(__instancecheck__) 1 0.000 0.000 0.000 0.000 abc.py:148(__subclasscheck__) 11 0.000 0.000 0.000 0.000 base.py:195(_is_empty) 11 0.003 0.000 0.003 0.000 base.py:202(empty) 7 0.000 0.000 0.003 0.000 base.py:212(__del__) 25 0.000 0.000 0.000 0.000 base.py:231(_geom) 2 0.000 0.000 0.000 0.000 base.py:235(_geom) 3 0.000 0.000 0.000 0.000 base.py:313(geometryType) 3 0.000 0.000 0.000 0.000 base.py:316(type) 3 0.000 0.000 0.001 0.000 base.py:383(area) 2 0.000 0.000 0.001 0.000 base.py:443(buffer) 8 0.000 0.000 0.000 0.000 base.py:46(geometry_type_name) 5 0.000 0.000 0.000 0.000 base.py:52(geom_factory) 3 0.000 0.000 48.626 16.209 base.py:529(intersection) 10 0.000 0.000 0.000 0.000 brine.py:106(_dump_int) 2 0.000 0.000 0.000 0.000 brine.py:150(_dump_str) 12/2 0.000 0.000 0.000 0.000 brine.py:179(_dump_tuple) 24/2 0.000 0.000 0.000 0.000 brine.py:202(_dump) 2 0.000 0.000 0.000 0.000 brine.py:332(dump) 10/2 0.000 0.000 0.000 0.000 brine.py:360(dumpable) 14/8 0.000 0.000 0.000 0.000 brine.py:369(<genexpr>) 2 0.000 0.000 0.000 0.000 channel.py:56(send) 1 0.000 0.000 0.000 0.000 collection.py:186(filter) 1 0.000 0.000 0.000 0.000 collection.py:274(__iter__) 1 0.000 0.000 0.000 0.000 collection.py:364(closed) 1 0.000 0.000 0.000 0.000 collections.py:37(__init__) 25 0.000 0.000 0.000 0.000 collections.py:53(__setitem__) 4 0.000 0.000 0.000 0.000 compat.py:17(BYTES_LITERAL) 2 0.000 0.000 0.000 0.000 coords.py:20(required) 2 0.000 0.000 0.000 0.000 geo.py:20(shape) 5 0.000 0.000 0.000 0.000 geos.py:484(errcheck_predicate) 8 0.000 0.000 0.000 0.000 impl.py:43(__getitem__) 2 0.000 0.000 0.000 0.000 point.py:124(_set_coords) 2 0.000 0.000 0.000 0.000 point.py:188(geos_point_from_py) 2 0.000 0.000 0.000 0.000 point.py:37(__init__) 2 0.000 0.000 0.001 0.000 protocol.py:220(_send) 2 0.000 0.000 0.001 0.000 protocol.py:227(_send_request) 2 0.000 0.000 0.000 0.000 protocol.py:241(_box) 2 0.000 0.000 0.001 0.000 protocol.py:438(_async_request) 2 0.000 0.000 0.000 0.000 stream.py:173(write) 11 0.000 0.000 0.000 0.000 topology.py:14(_validate) 3 0.000 0.000 0.000 0.000 topology.py:33(__call__) 3 48.626 16.209 48.626 16.209 topology.py:40(__call__) 2 0.001 0.000 0.001 0.000 topology.py:57(__call__) 5 0.000 0.000 0.000 0.000 utf_8.py:15(decode) 5 0.000 0.000 0.000 0.000 {__import__} 5 0.000 0.000 0.000 0.000 {_codecs.utf_8_decode} 3 0.000 0.000 0.000 0.000 {_ctypes.byref} 6/2 0.000 0.000 0.000 0.000 {all} 6 0.000 0.000 0.000 0.000 {getattr} 5 0.000 0.000 0.000 0.000 {globals} 10 0.000 0.000 0.000 0.000 {hasattr} 5 0.000 0.000 0.000 0.000 {isinstance} 31 0.000 0.000 0.000 0.000 {len} 5 0.000 0.000 0.000 0.000 {locals} 1 0.000 0.000 0.000 0.000 {math.sqrt} 2 0.000 0.000 0.000 0.000 {method 'acquire' of 'thread.lock' objects} 24 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 28 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects} 1 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects} 2 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects} 2 0.000 0.000 0.000 0.000 {method 'lower' of 'str' objects} 5 0.000 0.000 0.000 0.000 {method 'pack' of 'Struct' objects} 2 0.000 0.000 0.000 0.000 {method 'release' of 'thread.lock' objects} 2 0.000 0.000 0.000 0.000 {method 'send' of '_socket.socket' objects} 2 0.000 0.000 0.000 0.000 {next} I'm still trying to figure out exactly what it means, but I think it's telling me that the `intersection` function is taking ~16 seconds each time it's called. So, per @gene 's suggestion, I'm working on using a spatial index. I just have to figure out how to get libspatialindex going so that I can use Rtrees.

1 Expert Answer

By:

Rahul S. answered • 05/19/25

Tutor
New to Wyzant

Senior Software Engineer

Still looking for help? Get the right answer, fast.

Ask a question for free

Get a free answer to a quick problem.
Most questions answered within 4 hours.

OR

Find an Online Tutor Now

Choose an expert and meet online. No packages or subscriptions, pay only for the time you need.