Callaghan Innovation's R&D Career Grants give New Zealand’s future innovators a head start in their careers by enabling them to gain and develop their technical skills in a commercial research environment.
Theta recently appointed Umair Khan under this programme, to work on our data quality tool, Veracidata. We had a chat to Umair about his research background, and what he’s working on at the moment.
So you were a researcher at Otago University before this role at Theta. What were you working on there?
I was doing a Computer Science PhD, in the Graphics and Computer Vision lab, on Finding Emergent Patterns in Images. I designed algorithms to find patterns in large datasets of different images.
Many people in the computer vision field are working on detecting specific objects in a set of images – like a car or a person. This requires an approach called “supervised learning” with a training stage, where the computer is first taught what the characteristics of a car or a person are, then sets about identifying them.
In my research, I took a “unsupervised learning” approach to see what patterns would emerge, and whether the machine can learn to identify patterns without being taught them first.
This is kind of how an infant learns – at first they can only discern patterns, and gradually, over time, they put those patterns together. Can we design algorithms that enable machines do the same thing?
It was very much exploratory research, working in uncharted territory.
And what are you working on now?
Something completely different! Although I was interested in the business intelligence field, because it does relate to unsupervised learning – finding patterns in the data as they emerge, without necessarily knowing what patterns you are looking for. I guess that’s one of the reasons I ended up at Theta.
I’m working on a data quality platform Theta is developing, called Veracidata. The idea behind Veracidata is to measure data quality and establish the integrity of data in data warehouses. That’s important because this data is the basis for many decisions. Bad data might mean bad decisions.
My focus for now is developing Veracidata’s data profiler. The profiler will help us to establish - for every customer using Veracidata - what’s in their data warehouse, and how best to define rules that will measure the quality of data at a point in time. I’m also interested in how we can apply machine learning to advanced profiling activities, such as the detection of data duplication.
How has the transition been from the academic to business environment?
Fine actually. Theta has all the essentials – coffee! – and although it’s a bit more formal than at the lab its not too formal either.
I’ve worked as a software developer for a few years in Pakistan before coming to Otago for my PhD, so it’s not completely new. I was interested in moving to a more industry-oriented environment next.
BI is new for me, though, so I’ve spent my first few weeks getting to know the field a bit better, and its terminology and tools.
Also, although I’m working on a different kind of problem in developing the data profiler, one of the most important things I got from my PhD was the confidence to do stuff, to tackle difficult problems in an organized way, and not to be scared of uncharted territory! As that’s sometimes where the most interesting things happen.