This post has been long coming. I would like to end this blogging gap with a topic that has stalled many a testing effort of mine. We take a brief look at what test data is, how or why it is important, and then see how it fares in an Agile world.
The availability of the right test data is or should be the most important part of your test entrance criteria or test readiness checklist. Depending on the complexity of your application, and the kind of testing you need to perform, this could be an easy or herculean task. Testing a GUI application would probably be more intuitive, where your test data would be formed of various rules, field validations, boundary conditions etc. Testing a batch application would be more complicated where there are several permutations and combinations possible for not just one but several fields in a record. So a file with a record containing a hundred fields would definitely be more arduous than one containing five. Enter multi-files or other data formats like hex etc., and your complexity has gone up several notches.
Some organizations have dedicated departments that work solely on test data generation. This is a specialized skill that needs some finesse and experience. Following are some of the test data generation techniques I have come across and/or practiced –
1) Testing the UI – As mentioned above, this is easier, but not any less important. You want to test all possible validations, even things as simple as not allowing alpha characters in a number field. Boundary conditions should also be tested, and so should specific error messages if mentioned in your specs. A well written detailed specs document is a pre-requisite for this kind of testing.
2) Batch Testing – Developers provide data in many cases, since they have already used it for unit testing. It is beneficial to use this as a template, and generate several files with different test conditions, for detailed testing of each field or for negative testing. You may also have to build a file from scratch, based on the given file design – anyone who has done this is sure to remember this with a groan.
3) Data dumps – Data dumps obtained from production are a great way to test with realistic or actual data. This can also enable volume testing where you want to check if your new code can sustain large amounts of data.
4) Data Scrubbing – Scrubbing is very important, especially in industries like finance or healthcare, where NPI data is moved around from place to place. ‘Data Scrubbing’ is used to blot out all personal information, and dummy characters or random numbers inserted instead. This is generally carried out by dedicated test data generation departments, based on ‘rules’ provided by the test team.
5) Data Generation tools – In-house tools are sometimes built for data generation. These tools ask a user to enter certain details, and use a set of rules and some random data to generate relevant information. For example, to generate data for all fields required for a loan application, the user would enter something like Age or kind of company such as Sole proprietor or LLC( for the condition being tested), and the tool will randomize and provide all data required for a valid application.
6) Automation – Data driven testing is pretty popular and part of any test automation suite. This is one more instance where test data availability is crucial. Existing automation suites can be repositories of test data and can be used as a resource for future releases.
I think you get the idea here. I am sure there are several other methods for test data generation out there. Seasoned testers may even have their own special spins on this. We now come to the next question – how to handle test data on a Scrum team.
Most methods given above take time and planning. Sometimes there is ‘paper-work’ involved in dealing with other departments or other teams. These may not be Agile even if your team is. This leads to bottlenecks or delays that are difficult to surmount. Following are some steps that may be followed to avoid barriers or obstacles in your testing tasks –
1) Highlight the importance of test data and note it as a pre-condition. This could be done by noting this as a part of your entrance criteria or checklist. This could also be a precondition on your WBS or placed on the critical path.
2) Research and publicize the effort needed to generate test data for a given story. Solicit the help of the Scrum team members, and add task cards to establish accountability. Keeping the effort within the Scrum team will reduce wait times and allow this activity to be prioritized or ordered appropriately.
3) Develop reusable repositories of test data that can be used in subsequent iterations.
4) Interview actual business users, product owners etc. to obtain realistic values to test end to end business flows and document these for future use.
5) Establish SLAs with outer teams such as Data Generation units to respond quicker to Agile teams and have shorter turnaround times.
6) Cross train testers or DBAs on Agile teams so that they are able to carry out complex data generation tasks. DBAs or developers might be better qualified for this since some tasks may require production access.
7) SOS – coordinate and disseminate information across Scrum teams. It may be possible that the same test data is being generated by other teams. Collaborate to reduce duplication of work.
There were many occasions when test data was a stubborn member on our ‘barriers’ board and just wouldn’t budge. This is one item without which a serious testing effort cannot be complete, and should be treated as a pre-condition for product release rather than just for testing. Test data generation can be so much smoother when everyone on the Scrum team assumes collective responsibility for it.