Chord

by Newsweek

Jump to research

Composed by

D. V.

Views

656

Version history

D. V., 598d ago

June 14, 2023

threading the pytest process in python

I researched your query about threading the pytest process in Python by reviewing Reddit discussions from various subreddits, including r/dataengineering, r/learnpython, and r/learnprogramming. Some discussions were not directly related to your query but provided insights into testing frameworks, libraries, and concepts. There is no clear consensus on how to thread the pytest process, but I found several recommendations and insights that may be helpful. Here's what I found

Have an opinion? Send us proposed edits/additions and we may incorporate them into this article with credit.

Words

300

Time

6m 57s

Contributors

205

Words read

31.9k

Pytest and Threading

In a discussion about running the same test multiple times with different data in pytest, a user recommended using the `@pytest.mark.parametrize` decorator to send multiple values as arguments to the test function. This allows you to generate multiple tests from a single test function, potentially running them with different data. However, this does not directly address threading the pytest process itself.

"@pytest.mark.parametrize('n', lst)\n def test_zeo(n):\n assert n == 0"

Threading and Multiprocessing

In a discussion about running the same function multiple times simultaneously with Python, some users suggested using "threading" or "multiprocessing" modules. While these recommendations are not specific to pytest, they may provide a starting point for exploring how to thread the pytest process. Another user recommended looking into the "asyncio" module as an alternative to "threading" and "multiprocessing" for running code asynchronously.

"There is no such restriction. You can have them run whatever floats your boat."

"Yeah this is correct, you can do something like this:"

"But as an alternative with something I know well, you could look at asyncio."

Multi-threading in Python

A discussion about using multi-threading in Python touched on the Global Interpreter Lock (GIL) and its impact on Python's multi-threading capabilities. The GIL ensures that only one thread executes Python bytecode at a time, which can limit the benefits of multi-threading for CPU-bound tasks. However, multi-threading can be useful for IO-bound tasks, where threads spend most of their time waiting for I/O to complete. In conclusion, although I did not find a clear consensus on how to thread the pytest process in Python, the information above may provide some ideas and starting points for your own exploration.

"The Python wiki explains the concept of the GIL here https://wiki.python.org/moin/GlobalInterpreterLock"

"If a thread is waiting most of the time instead of actually needing to use the CPU, then there's no problem with locking or threads computing for CPU time."

Jump to top

Research

"[multiprocessing] May I use more parallel processes than cores?"

The post title is “multiprocessing: May I use more parallel processes than cores?” and was posted in r/learnpython.
The poster is working with OpenCV and multiple IP cameras, and is currently using multiprocessing to parallelize stream initialization.
There is a comment recommending using a number of workers equal to the number of available cores plus two to fully saturate them.
Another comment points out that the limit on process quantity is not Python but the OS, and that you can use more processes than available cores, but this will lead to poor performance as there is still a limited number of cores.
If you nest processes, they will still have access to the same cores and can be used for multiprocessing as long as they are not in daemon mode.
OpenCV is likely network-bound so other processes can use the CPU while it’s waiting, which makes multiprocessing ideal for this use case.
Select or epoll can be used to multiplex inputs.
Multiprocessing can be used to create more workers for multi-camera streams.
No agreement on how many processes should be used, but relevance of the number of cores emphasized.
Doubt about multithreading with affinity selection.

The webpage is titled “Using multi-threading in python”.
The author has 2 questions regarding multi-threading: what is the GIL in python and what are the benefits of using multi-threading when they don’t work in parallel.
A reddit user recommends watching Raymond Hettinger’s talk on concurrency in Python, as multi-threading is not the only option for concurrency.
The GIL (Global Interpreter Lock) is a mutex that ensures that only one thread executes Python bytecode at a time. The lock is necessary mainly because CPython’s memory management is not thread-safe, so the GIL prevents race conditions and conflicting accesses to the memory.
Because of the GIL, multi-threading in Python is not ideal for CPU-bound tasks that require heavy processing, but it can work well for IO-bound tasks where the threads spend most of their time waiting for I/O to complete.
Another user mentions that Python doesn’t have a package named “multi-threading” and that multiprocessing can use all available cores.
However, when using multiprocessing, you need to be more careful about memory usage and explicitly control the flow of information.
The benefits of using multi-threading include the ability to have several tasks lined up so that while one task is waiting for I/O or some other event, the CPU can work on another task.
Threading won’t help much for CPU-bound tasks that are using the CPU at full capacity.
The downside of asyncio is that it’s more complicated to implement versus simply launching any function in another thread but at the same time, you can easily run into issues with threading that asyncio never experiences.
A reddit user explains that they used threading to read a memory buffer constantly while executing other parts of code at the same time.
Another user asks about the difference between threading and event listeners, and the response is that both can perform similar tasks, but threading is more flexible and offers more control.
The same user also asks how threading was used in the Reddit chat example, and the explanation is that each user is assigned a thread to capture I/O when they enter the chat.
There are several comments complimenting the explanation and the resources provided for learning more about the topic.

The webpage is a Reddit post in the r/learnpython subreddit titled “Is there an alternative to using a bunch of nested If else statements?” posted 1 year and 4 months ago.
The post describes a Python code that accesses data from a JSON file, where the data changes from being in a list, dictionary or string and then sorts it and implements it in a CSV file format.
The author of the post mentions that the program works, but it is incredibly ugly and frustrating.
The author of the post asks for suggestions to improve the code and to avoid using nested If else statements.
Several users recommend using logical operators (and, or) to group/eliminate conditions or using functions instead of nested If else statements.
Some other users suggest using the JSON library to parse the JSON file and accessing the data directly.
One user suggests using singledispatch, a method from functools, to write his code. singledispatch is a python method of registering functions that serve as type-specific implementations of a function for different types of input.
Another user suggests using loops or functions to avoid deep nesting and improve code readability.
A user recommends early exits from deep inside nested loops and If statements to improve code behavior and reduce nestings.
One user notes that code blocks that are not too long are good for readability, comments that expanding/collapsing from the code editor can be helpful, and mentions the existence of Python’s match case statement.
The author of the post provides an image link to the JSON file and his Python code implementation that uses nested If else statements, which some users critique for being non Pythonic. They suggest using more Pythonic constructs like isinstance and functions to break logic down.
Some users recommend using text-sharing sites for sharing code rather than screenshots to improve readability.
There is a suggestion to use cjson, a fast JSON processing library, to deal with parsing speed and memory limitations.
A user suggests that it can be a good practice to call functions rather than increasing the level of nesting, depending on how deep the logic has to nest.
A few users note that they are new to Python.
The subreddit author seems grateful for the advice and input they received and intends to incorporate the changes recommended to improve their code.

"Simultaneously Run Same Function Multiple Times Using Python"

Users on Reddit are looking for a way to run multiple iterations of a function simultaneously using Python.
The user who posted the question on Reddit has already tried exploring “threading” and “multiprocessing” modules in Python but was unable to find a solution to their problem.
A user with six karma suggests a solution to use “threading” to run the same function multiple times simultaneously.
The suggested solution is to create a list of threads using a for loop, each with a target function set to the desired function and parameter values, and then start and join all threads using for loops.
A user with minimal karma recommends looking into the “asyncio” module as an alternative to “threading” and “multiprocessing” for running code asynchronously.
The “asyncio” module runs on a single thread and fills in time spent waiting for a response with other asynchronous calls.
The user suggests that the “asyncio” module may be useful depending on the specific requirements of the user’s function.
The user with five karma notes that “threading” and “multiprocessing” modules can indeed be used to run the same function multiple times simultaneously. They suggest there is no such restriction and that using these modules can be helpful depending on the user’s specific requirements.
A user with three karma thanks the user with six karma for their solution and confirms that it worked for their needs.
No further information on “pytest” and its relation to the user’s query is provided on this webpage.

The user is new to Python code testing and wants suggestions on testing frameworks, libraries, or references to use.
Currently, the user is using unittest to run unit tests on a Pyside2 application.
The user plans to write integration tests, user acceptance tests, and perhaps UI automation tests and behavior-driven tests with Lettuce in addition to their existing unit tests.
The user has a runner on Github Actions for running tests on Windows, Mac, and Ubuntu and is considering using travis-ci.
The user is looking for frameworks or ways to generate stats for their tests.
Recommended testing frameworks include pytest and Hypothesis.
The subreddit recommends pytest not just for its tool but for its plugin ecosystem.
pytest generates reports in JUnit XML format, which is the de facto standard for machine-readable test run results.
Other recommended resources include Allure Reports, pytest-coverage, and ward.
ward is a newer, less popular, and less mature testing framework project that offers a good vision and good docs.
Property based testing is a type of testing recommended by other users in the subreddit. A library mentioned is Hypothesis, which tests logical properties of code and allows users to try tens to millions of examples to find a counter-example to the property.
The user has read several books on software testing, including Concise Guide to Software Testing, Testing Python, Python Testing with pytest, The Art of Software Testing, and xUnit Test Patterns.
Other users in the subreddit are interested in which testing resources the user has found useful for learning the concepts of testing.
Some users mention other libraries or frameworks for testing Python code, including doctest, Gherkin, and the Robot Framework.
pytest fixtures are also recommended as they handle setup/teardown all self-contained by utilizing yield.
Several users recommend testing code examples in documentation through doctests, as it is very valuable.

"python - How to run pytest tests in parallel? - Stack Overflow"

Not used in article

"Unit testing/Pytest for Data Analysts/Engineers"

A Reddit thread from r/dataengineering with a post titled “Unit testing/Pytest for Data Analysts/Engineers.”
Discusses the best practices of unit testing codes in data engineering and how it is similar to any other software engineering discipline.
The unit testing library in Python is useful in data engineering just like anywhere else.
Includes modules for mocking and patching that allow mimicking responses from third-party APIs.
External calls in unit tests should be avoided, and unit tests should focus on validating custom logic in codes.
Testing incoming and outgoing data is critical to catch any errors in logic before going further down the pipeline.
Tools such as “pydantic” and “pandera” are effective for verifying complex logic in data.
“Great Expectations” is useful for writing asserts across DataFrames quickly and generating documentation along with data insights.
References to the difficulty of creating mock data and writing isolated pieces of code for effective unit testing.
Writing a test first and then the actual code is a great practice.
Includes tips and tricks and caveats/specifics of techniques for unit testing with SQL, especially around creating mock data.
Lists some benefits of unit testing such as quicker deployment of changes to a system, and as, acting as documentation of business rules.
However, unit testing for data engineering may become boring, repetitive, and time-consuming, particularly for testing business logic, transformations, etc.
A reference to the challenges of testing spark streaming data and a recommendation of using techniques such as “MemoryStream.”
Some comments discussing personal experience with unit testing in various contexts.
Includes several brief notes to remind a user about the thread on different dates.

The webpage discusses how to run the same test multiple times with different data in pytest.
The author of the question wants to run one test on a list of numbers [1, 2, 3].
The author defines a test function test_zero(n) that checks if n is equal to 0.
The author is looking for a way to pass [1, 2, 3] to test_zero(n) as n for each test run.
The author has looked through pytest’s fixture documentation but can’t find a simple way to run a test multiple times.
A user recommends using the @pytest.mark.parametrize decorator to send multiple values as arguments to test_zero(n).
The user provides an example of how to use @pytest.mark.parametrize with lst and n.
The example uses lst = [1, 2, 3] and passes each value of lst to test_zero(n) as n.
The assert statement in test_zero(n) will produce a test failure for each value of lst that is not zero.
The user also provides a link to the pytest documentation for @pytest.mark.parametrize.
The documentation explains how to use @pytest.mark.parametrize to generate multiple tests from a single test function.
The documentation recommends using a tuple of values for each argument in @pytest.mark.parametrize.
The documentation notes that @pytest.mark.parametrize can be used to test edge cases by passing extreme values as arguments.
A user comments that @pytest.mark.parametrize is a powerful tool in pytest and recommends using it for various test scenarios.
The user clarifies the difference between a fixture and a parameterization in pytest.
The user notes that fixture is used to set up test data or perform actions before or after a test, whereas parameterization is used to generate multiple tests with different data.
The user recommends using both fixture and parameterization to create versatile test suites in pytest.
The comments have a modest upvote score of 2 and 3, indicating some appreciation for the answers.

Pitfalls and common issues of multithreaded automation testing

💭 Looking into

Walkthrough to run pytest in a multithreaded mode and the potential benefits of it

Jump to research

Version history

June 14, 2023

threading the pytest process in python

Pytest and Threading

Threading and Multiprocessing

Multi-threading in Python

Jump to top

Research

"[multiprocessing] May I use more parallel processes than cores?"

"A Guide to Python Multiprocessing and Parallel Programming"

"Parallel Processing in Python – A Practical Guide with Examples"

"Using multi-threading in python"

"dbt vs R/Python for transformation"

"multithreading - How do I multithread SQL Queries in python such that I ..."

"Data Testing Tools, Pytest vs Great Expectations vs Soda vs Deequ"

"Flask threading/multi-process best practice"

"What's the best way to run Selenium tests in parallel automatically?"

"Is there an alternative to using a bunch of nested If else statements?"

"Simultaneously Run Same Function Multiple Times Using Python"

"Running unit tests in parallel with pytest? - Stack Overflow"

"How can I resolve an error running pytest in parallel via xdist in ..."

"Parallely running parameterized tests in pytest - Stack Overflow"

"Is it possible to execute the same test in parallel in pytest?"

"New to Python code testing. What frameworks/references/etc. would you suggest I check out?"

"python - How to run pytest tests in parallel? - Stack Overflow"

"Unit testing/Pytest for Data Analysts/Engineers"

"Standardized Tests: The Benefits and Impacts of Implementing ..."

"Collaboration metrics: how to measure collaboration in a team"

"[pytest] Running same test multiple times with different data (feel like I'm missing something)"

"python - how to run pytest.main() with thread - Stack Overflow"

"PyTest - Run each Test as a Mutlitprocessing Process"

"python - Running Pytest Multiple Times from Different Threads - Stack ..."

"python 3.x - Running pytest with thread - Stack Overflow"