Live image processing using flutter

As of late Flutter group added picture streaming ability in the camera module. This permits you to catch the edge in a live camera see.

I made a demo application that utilizations picture gushing with tflite (TensorFlow Lite) module to accomplish constant item location in Flutter.

Here’s a short video caught on my iPad showing the application

Picture gushing in camera module

Allude to the connection to add the camera module to the Flutter venture.

To begin picture streaming, call startImageStream in the camera regulator. The technique is set off each time another casing shows up.

controller.startImageStream((CameraImage img) { <YOUR CODE> });

The yield CameraImage class has 4 individuals: picture design, tallness, width lastly planes which comprises of the bytes of the picture.

class CameraImage {
  final ImageFormat format;
  final int height;
  final int width;
  final List<Plane> planes;
}

The organization of the picture changes with the stages:

Android: android.graphics.ImageFormat.YUV_420_888

iOS: kCVPixelFormatType_32BGRA (Note that the configuration was kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange in v2.8.0. Later in v4.0.0 it was changed back to 32BGRA.)

Because of the distinctive configuration, the yield CameraImage on iOS and Android are unique:

Android: planes is a rundown of bytes varieties of Y, U and V planes of the picture.

iOS: planes contains a solitary cluster containing the RGBA bytes of the picture.

Realizing the organization is significant for appropriately translating the picture and taking care of it to TensorFlow Lite.

Disentangling CameraImage

It’s conceivable to interpret the picture in Dart code however shockingly the picture library works gradually on iOS. Interpreting the casings from local code is quicker and more proficient.

iOS:

Since the configuration is as of now RGBA, we just need to remove the Red, Green, Blue qualities from the bytes and feed them to the information tensor of a TensorFlow Interpreter.


const FlutterStandardTypedData* typedData = args[@"bytesList"][0];
uint8_t* in = (uint8_t*)[[typedData data] bytes];
float* out = interpreter->typed_tensor<float>(input);
for (int y = 0; y < height; ++y) {
  const int in_y = (y * image_height) / height;
  uint8_t* in_row = in + (in_y * image_width * image_channels);
  float* out_row = out + (y * width * input_channels);
  for (int x = 0; x < width; ++x) {
    const int in_x = (x * image_width) / width;
    uint8_t* in_pixel = in_row + (in_x * image_channels);
    float* out_pixel = out_row + (x * input_channels);
    for (int c = 0; c < input_channels; ++c) {
      out_pixel[c] = (in_pixel[c] - input_mean) / input_std;
    }
  }
}
  • Android:

We have to initially make a bitmap in RGBA design from the bytes of YUV planes. A simple path is to utilize a render content to accomplish the change.


ByteBuffer Y = ByteBuffer.wrap(bytesList.get(0));
ByteBuffer U = ByteBuffer.wrap(bytesList.get(1));
ByteBuffer V = ByteBuffer.wrap(bytesList.get(2));

int Yb = Y.remaining();
int Ub = U.remaining();
int Vb = V.remaining();

byte[] data = new byte[Yb + Ub + Vb];

Y.get(data, 0, Yb);
V.get(data, Yb, Vb);
U.get(data, Yb + Vb, Ub);

Bitmap bitmapRaw = Bitmap.createBitmap(imageWidth, imageHeight, Bitmap.Config.ARGB_8888);
Allocation bmData = renderScriptNV21ToRGBA888(
    mRegistrar.context(),
    imageWidth,
    imageHeight,
    data);
bmData.copyTo(bitmapRaw);

The code that converts NV21 to RGBA is as below. I got it from https://stackoverflow.com/a/36409748.


public Allocation renderScriptNV21ToRGBA888(Context context, int width, int height, byte[] nv21) {
  RenderScript rs = RenderScript.create(context);
  ScriptIntrinsicYuvToRGB yuvToRgbIntrinsic = ScriptIntrinsicYuvToRGB.create(rs, Element.U8_4(rs));

  Type.Builder yuvType = new Type.Builder(rs, Element.U8(rs)).setX(nv21.length);
  Allocation in = Allocation.createTyped(rs, yuvType.create(), Allocation.USAGE_SCRIPT);

  Type.Builder rgbaType = new Type.Builder(rs, Element.RGBA_8888(rs)).setX(width).setY(height);
  Allocation out = Allocation.createTyped(rs, rgbaType.create(), Allocation.USAGE_SCRIPT);

  in.copyFrom(nv21);

  yuvToRgbIntrinsic.setInput(in);
  yuvToRgbIntrinsic.forEach(out);
  return out;
}

When we have the crude bitmap, we can resize it to fit the info size required and feed the RGB qualities to the information tensor.


ByteBuffer imgData = ByteBuffer.allocateDirect(1 * inputSize * inputSize * inputChannels * bytePerChannel);
imgData.order(ByteOrder.nativeOrder());

Matrix matrix = getTransformationMatrix(bitmapRaw.getWidth(), bitmapRaw.getHeight(),
    inputSize, inputSize, false);
Bitmap bitmap = Bitmap.createBitmap(inputSize, inputSize, Bitmap.Config.ARGB_8888);
final Canvas canvas = new Canvas(bitmap);
canvas.drawBitmap(bitmapRaw, matrix, null);
int[] intValues = new int[inputSize * inputSize];
bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(), bitmap.getHeight());

int pixel = 0;
for (int i = 0; i < inputSize; ++i) {
  for (int j = 0; j < inputSize; ++j) {
    int pixelValue = intValues[pixel++];
    imgData.putFloat((((pixelValue >> 16) & 0xFF) - mean) / std);
    imgData.putFloat((((pixelValue >> 8) & 0xFF) - mean) / std);
    imgData.putFloat(((pixelValue & 0xFF) - mean) / std);
  }
}

Recognize objects utilizing tflite module

The tflite module wraps TensorFlow Lite API for iOS and Android. It actualized local code for taking care of information and extricating yield of well known models. For object location, it upholds SSD MobileNet and YOLOv2.

Image for post

The module gives a detectObjectOnFrame strategy which can decipher picture stream from camera module (in the engine it utilizes the code portrayed above), run derivation and return the acknowledgments.

We can just pass the planes bytes of CameraImage to the strategy and get the distinguished articles.


Tflite.detectObjectOnFrame(
  bytesList: img.planes.map((plane) {
    return plane.bytes;
  }).toList(),
  model: widget.model == yolo ? "YOLO" : "SSDMobileNet",
  imageHeight: img.height,
  imageWidth: img.width,
  imageMean: widget.model == yolo ? 0 : 127.5,
  imageStd: widget.model == yolo ? 255.0 : 127.5,
  numResultsPerClass: 1,
  threshold: widget.model == yolo ? 0.2 : 0.4,
).then((recognitions) {
  print(recognitions);
  isDetecting = false;
});

The output is a list of objects in the following format:

{
  detectedClass: “hot dog”,
  confidenceInClass: 0.123,
  rect: {
    x: 0.15,
    y: 0.33,
    w: 0.80,
    h: 0.27
  }
}

x, y, w, h characterize the left, top, width and tallness of the container that contains the item. The qualities are between [0, 1]. We can scale x, w by the width and y, h by the stature of the picture.

Show the yields

A little issue with camera module is the see size doesn’t generally fit the screen size. It’s prescribed to place the see in an AspectRatio gadget to try not to extend it however that will leave edges.

I for one incline toward the camera review to take the entire screen so I discover a workaround by placing the see in an OverflowBox gadget: first analyze the stature/width proportion of the see and the screen, at that point scale the see to cover the screen by fitting either the screen width or screen tallness.


Widget build(BuildContext context) {
  var tmp = MediaQuery.of(context).size;
  var screenH = math.max(tmp.height, tmp.width);
  var screenW = math.min(tmp.height, tmp.width);
  tmp = controller.value.previewSize;
  var previewH = math.max(tmp.height, tmp.width);
  var previewW = math.min(tmp.height, tmp.width);
  var screenRatio = screenH / screenW;
  var previewRatio = previewH / previewW;

  return OverflowBox(
    maxHeight:
        screenRatio > previewRatio ? screenH : screenW / previewW * previewH,
    maxWidth:
        screenRatio > previewRatio ? screenH / previewH * previewW : screenW,
    child: CameraPreview(controller),
  );
}

Appropriately, when drawing the cases, scale the x, y, w, h by the scaled width and tallness. Note that x or y must be deducted by the contrast between scaled height(width) and screen height(width), since a piece of the review is taken cover behind OverflowBox.


var _x = re["rect"]["x"];
var _w = re["rect"]["w"];
var _y = re["rect"]["y"];
var _h = re["rect"]["h"];
var scaleW, scaleH, x, y, w, h;

if (screenH / screenW > previewH / previewW) {
  scaleW = screenH / previewH * previewW;
  scaleH = screenH;
  var difW = (scaleW - screenW) / scaleW;
  x = (_x - difW / 2) * scaleW;
  w = _w * scaleW;
  if (_x < difW / 2) w -= (difW / 2 - _x) * scaleW;
  y = _y * scaleH;
  h = _h * scaleH;
} else {
  scaleH = screenW / previewW * previewH;
  scaleW = screenW;
  var difH = (scaleH - screenH) / scaleH;
  x = _x * scaleW;
  w = _w * scaleW;
  y = (_y - difH / 2) * scaleH;
  h = _h * scaleH;
  if (_y < difH / 2) h -= (difH / 2 - _y) * scaleH;
}

Inference time per frame

I tested the app on my iPad and Android phone. SSD MobileNet works well on both platforms but Tiny YOLOv2 has quite a lag on Android.

  • iOS (A9)
    SSD MobileNet: ~100 ms
    Tiny YOLOv2: 200~300ms
  • Android (Snapdragon 652):
    SSD MobileNet: 200~300ms
    Tiny YOLOv2: ~1000ms

Thank you for reading till the end. 🙂

Leave a Comment