What's the difference between Python's copy.copy() and copy.deepcopy()?
I'm working on a Python application and running into an issue with Python debugging. Here's the problematic code:
# Current implementation
import threading
import time
def worker():
global counter
for _ in range(100000):
counter += 1 # Race condition here
counter = 0
threads = [threading.Thread(target=worker) for _ in range(4)]
for t in threads:
t.start()
The error message I'm getting is: "KeyError: 'missing_key'"
What I've tried so far:
- Used pdb debugger to step through the code
- Added logging statements to trace execution
- Checked Python documentation and PEPs
- Tested with different Python versions
- Reviewed similar issues on GitHub and Stack Overflow
Environment information:
- Python version: 3.11.0
- Operating system: Ubuntu 22.04
- Virtual environment: venv (activated)
- Relevant packages: django, djangorestframework, celery, redis
Any insights or alternative approaches would be very helpful. Thanks!
2 Answers
The difference between threading and multiprocessing in Python is crucial for performance:
Threading (shared memory, GIL limitation):
import threading
import time
def io_bound_task(name):
print(f'Starting {name}')
time.sleep(2) # Simulates I/O operation
print(f'Finished {name}')
# Good for I/O-bound tasks
threads = []
for i in range(3):
t = threading.Thread(target=io_bound_task, args=(f'Task-{i}',))
threads.append(t)
t.start()
for t in threads:
t.join()
Multiprocessing (separate memory, no GIL):
import multiprocessing
import time
def cpu_bound_task(name):
# CPU-intensive calculation
result = sum(i * i for i in range(1000000))
return f'{name}: {result}'
# Good for CPU-bound tasks
if __name__ == '__main__':
with multiprocessing.Pool(processes=4) as pool:
tasks = [f'Process-{i}' for i in range(4)]
results = pool.map(cpu_bound_task, tasks)
print(results)
Concurrent.futures (unified interface):
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
# For I/O-bound tasks
with ThreadPoolExecutor(max_workers=4) as executor:
futures = [executor.submit(io_bound_task, f'Task-{i}') for i in range(4)]
results = [future.result() for future in futures]
# For CPU-bound tasks
with ProcessPoolExecutor(max_workers=4) as executor:
futures = [executor.submit(cpu_bound_task, f'Process-{i}') for i in range(4)]
results = [future.result() for future in futures]
Comments
john_doe: Great Python profiling example! The cProfile output helped me identify the bottleneck in my data processing pipeline. 1 week, 4 days ago
Here's how to optimize Python code performance using profiling tools:
1. Use cProfile for function-level profiling:
import cProfile
import pstats
# Profile your code
cProfile.run('your_function()', 'profile_output.prof')
# Analyze results
stats = pstats.Stats('profile_output.prof')
stats.sort_stats('cumulative')
stats.print_stats(10) # Top 10 functions
2. Use line_profiler for line-by-line analysis:
# Install: pip install line_profiler
# Add @profile decorator to functions
@profile
def slow_function():
# Your code here
pass
# Run: kernprof -l -v script.py
3. Memory profiling with memory_profiler:
# Install: pip install memory_profiler
from memory_profiler import profile
@profile
def memory_intensive_function():
# Your code here
pass
# Run: python -m memory_profiler script.py
4. Use timeit for micro-benchmarks:
import timeit
# Compare different approaches
time1 = timeit.timeit('sum([1,2,3,4,5])', number=100000)
time2 = timeit.timeit('sum((1,2,3,4,5))', number=100000)
print(f'List: {time1}, Tuple: {time2}')
Your Answer
You need to be logged in to answer questions.
Log In to Answer