When testing API you use tools like Postman or write tests in Java, Python or another language. Reviewing defects reported by users, reveals critical techniques that you can use to test API. In this blog post, I review defects in a Python package to access Twitter, to get insights on how to test the API. There are useful lessons for testing API in general.
Twitter has it’s own Python package for accessing the API. The package I review here is not the official Twitter Python package.
When reviewing defects, I want to explicitly avoid ‘fixing’ the problem. When someone or a team looks at customer defects, all their energy goes into ‘fixing’ the problem. Instead, I will focus on what might have been difficult in anticipating that problem. I am also very aware of hindsight bias. When I encounter an issue, I explicitly think about how the learning could be applied broadly to other areas.
I’ve included a mix of different types of issues. Some of these may make you question whether the issue is worth consideration. All may bring up philosophical questions.
I don’t mean to simplify the issues or to imply that I could do things differently. Software testing is difficult. I always want to treat issues with humility. My focus is on learning how to find defects.
It helps if you review defects on the product you are testing. There are some challenges to reviewing your own product, which I will discuss in another blog post.
I was not part of the development of this API. I could be wrong about the reason the defect was not addressed. The purpose of this blog post is to present a perspective on API testing.
Calls to rate limit status, count against the rate limit?
Twitter, like other API, limits the number of calls that you can make over a given time period. You can call an API to check on the status of rate limiting. The problem in this case is that the API to check status also counts against the rate limit.
What makes this non-intuitive is that you could miss that the status check also counts towards rate limiting.
Why is this a problem? This determines how you design your code. If the status check also counts against the rate limit, you need to make sure you don’t keep checking the status in a loop. This is a reminder that when working with Twitter you are processing large amounts of data and making many calls to the API for status checks. This also shows what is important for users, viz., every call that counts towards rate limiting.
If I were testing this API, I should think about how I can verify rate limits for API. I should ask if I might have overlooked an API call. Are there any calls that I might have missed.
This is a great opportunity to use for testing. I can write a script which will check rate limits for all API. I can write scripts to track calls which may not be obvious.
When testing this API, I should create scenarios just like an end user would. That will highlight the challenges with rate limiting. I know that it will be difficult to really put myself in the user’s shoes.
In the documentation should I explicitly call out that status checks will be rate limited?
Image posting needs to support Alt text
When posting images, you need to support Alt text. This is a regulation for certain applications.
What makes this non-intuitive is that it isn’t directly related to the functionality of the API. The impact of localization isn’t evident. Small teams may not have someone with expertise in localization.
Why is this a problem? Certain applications must support localization. It is a regulation.
If I were testing this API, I should consider the impact of localization. I need to distinguish between this API and others. This API posts on behalf of users and it is different from API that reads data. The nature of Twitter is that users will post images and other media.
How can I avoid hindsight bias? What other places might I have missed thinking about localization? Do I really understand enough about localization to find problems? Are there other aspects such as performance or security that I may not be aware of? How about regulatory compliance?
Why is there a “like” feature if it seems forbidden?
The API allows you to “like” tweets. However, Twitter’s policies states that you cannot use this in an automated manner.
What makes this non-intuitive Developers many not really consider the impact of policies. Legal staff may not really understand API or user scenarios.
Why is this a problem? Twitter and other social media have got a lot of negative attention for fake news and misuse of the platform. Developers will be cautious about being liable. API users are not end users. They are mostly developing applications for other users. They will be sensitive to liability, i.e., of inadvertently exposing their users.
If I were testing this API, I should think about how the API can be misused. Can it be automated? Do we check identity of the API users? What is a legitimate use of the API compared to illegitimate use? What are the legal policies? Who is responsible for drafting policies?
How can I avoid hindsight bias? Which API are more susceptible to abuse? Can we provide some guidance on what constitutes abuse? Can we provide a contact if someone is not sure? Issues of abuse should be treated differently compared to general support questions. Does it matter if no one reads the policies? Do the legal staff care whether anyone actually reads legal policies? How can we detect abuse?
What is the different between ‘automated’ usage and ‘programmatic’ usage?
What happens if the page with the policies changes over time? How do dependent products keep track of policy changes? Is it enough to show the date when the page was last updated?
Can we include support for static type hints?
Python is a dynamically typed language. Adding function signatures will allow the use of static types. This will also allow integration with IDEs and other tools for static type checking.
This is more of a design issue.
Can a cursor start at a location other than start or end?
When retrieving large amounts of data using the Twitter API, twitter uses a method they call ‘cursoring’. Cursoring separates results into pages, allowing you to move backward and forward between pages (from the Twitter documentation).
A user asked whether you can start at a location other than the start or end.
What makes this non-intuitive This seems like an unreasonable request. It seems logical that you start at some point and keep scrolling through pages. Further, the mechanics of using a cursor are a bit involved. You can spend a lot of time working with cursors and not think about what can go wrong or what we didn’t think about.
Why is this a problem? If you work through the issue, you realize that, if after using a cursor the API crashes or stops working for some reason, all your work is lost. You would want to start at the last point.
Note that this isn’t an issue with the API. You can save the cursor retrieved and pass them to the endpoints. I included this as an interesting question that isn’t obvious.
If I were testing this API, I would make sure I make the cursor crash and see what is involved to pick up the pieces. I would make sure I try use cases for scripting, i.e., scraping data as well as creating an application.
How can I avoid hindsight bias? When processing a large amount of data, I need to think about recovering if the process crashes or is halted for any reason. I need to remind myself that a major use case for twitter data is getting a large amount of data.
It is critical that the user finds the specific error when reporting problems. It may be worth writing up a, ‘Help me help you’ topic.
API Testing Heuristics
Michael Bolton has a wonderful series of blog posts on how API testing is exploratory. You could map the issues listed here to the heuristics discussed by Michael. (I think API testing be exploratory, but it rarely is.) There are heuristics specific for API testing. It’s instructive to compare those heuristics to Michael’s blog post.
Reviewing defects reinforces the techniques that you can use to find defects. The use of tools or writing code when testing API is even less effective for finding field defects compared to test automation. (To be fair, test automation provides stability and confidence against change. The defects found by test automation won’t be public. The purpose is orthogonal to field defects.)
— — — — — — — — — — — — — — —
Part 2: Reviewing More Issues
These additional issues give a better perspective on testing this API. I left them out of the first part to keep the length of the post manageable.
Do we document optional arguments? What are the conventions for the use of optional arguments?
A user reported that it wasn’t clear how to use optional arguments, when there are multiple groups of options.
What makes this non-intuitive When developing and testing a product, it is possible to overlook language nuances which may trip up users.
Why is this a problem? When there are multiple optional arguments, users may not know how to ignore optional arguments.
How can I avoid hindsight bias? I should think about other language features which may not be evident to all users.
Add retries for transient issues or issues with Twitter
The API should add exception handling for transient network issues or when Twitter itself is unavailable.
What makes this non-intuitive Issues with the network or with Twitter will be considered very unusual, especially during development. Sites like Twitter have large teams and infrastructure for ensuring availability. It’s very unlikely that this would be considered a valid use case. It’s also unlikely that this type of error would be encountered while developing the software. During development engineers may not develop industrial strength applications. They also may not have skin in the game to worry if there is a rare outage.
Why is this a problem? The main use case for Twitter API is processing large amounts of data. This increases the likelihood that applications may encounter an error, even if it is rare.
How can I avoid hindsight bias? Given the use case of processing large amounts of data, I should think about other functions which might be affected. I should think about different levels of logging. I should make sure if functions are interrupted, they can resume. I should make sure users are educated on how to think about handling exceptions, e.g., with samples and documentation.
Can we abstract rate limiting in API?
Instead of API users having to write code to handle rate limiting, can we have a flag, in API functions, which implements rate limiting?
This is more of a design issue.
Tweets are not posted when logged out of the account
When logged out of the account, replies on a bot’s timeline, from the API, are hidden.
What makes this non-intuitive This is a complex scenario. It’s also a real world scenario, i.e., it might be difficult to create unless you are a real user and there is real data. When working with a API wrapper, I may not be privy to the structure of how users login and how that affects the API. Given the negative attention attracted by Twitter bots, this may not be something which anyone is interested in testing.
Why is this a problem? From the business point of view, this may not be a use-case which gets attention.
If I were testing this API, I should think about the identity management system as a whole, not just authorization. I should think about using bots, if only to test the limits of the system. I know that I will face push back if I decided to test bots, instead of spending time on primary use cases.
How can I avoid hindsight bias? I should think about the identity management system as a whole. It isn’t just authorization. I should include use-cases which may be impacted by a user actually being logged in to the account. Scraping data probably isn’t affected. Tweeting and replying from an account may be affected. Any activity on a user’s timeline by the same user may be affected.
Restarting a stream results in a lot of disconnected threads
After disconnecting a stream the user notices that there are still a lot of running threads.
What makes this non-intuitive, In a lab environment, I may not create an industrial strength application. I may not create multiple threads. I may not notice what happens after the application is restarted. I may not question whether “disconnect” really means disconnect.
Why is this a problem? As mentioned in other issues, accessing a large amount of data is a major use case for the Twitter API. Another, related issue is the rate limiting. Having threads which are pulling data counts against the rate limit. Loosing money (or the equivalent rate limit) will be a sore point for customers.
If I were testing this API, I would make sure I create an industrial strength application with multiple threads. I would use tooling to track open threads.
How can I avoid hindsight bias? I want to research techniques to make applications which handle a large amount of data more efficient, such as multi-threading. I need to think about how to track resources, especially after API functions are complete. How can I log the use of resources? What are the other characteristics of an application to access streaming data that I haven’t thought about? Are there other places where processing is closed which I did not verify?
Originally published at https://www.linkedin.com.