Ticket Name: TDA2X - TIDL model EVE execution

Query Text:
Hi everyone, I am working with Pedestrian detection model trained in Caffe framework. I made conversion to TIDL format and results for sample image after conversion are as expected. Bounding box is located where object really is. When running this model on DSP core within TIDL usecase, using same input image, results are identical to those during the conversion. However, when running it on EVE core, I am getting correct format on the output (confidence value, label, coordinates are valid values) but bounding box occupies majority of image and results are nowhere near the correct ones. To test further, I converted this model for running on EVE and DSP combined (layersGroupId param). As a comparison I used other object detection model, provided by TI as an example. This model worked well when running with TIDL OD usecase which feature one EVE and one DSP core. But when I set my pedestrian detection model, with this usecase it gave results like ones when it was run on EVE core previously. Does anybody know, why my model is not working on EVE as it is expected? P.S. tidl_import_zf_od.txt # Default - 0
randParams         = 0 

# 0: Caffe, 1: TensorFlow, Default - 0
modelType          = 0

# 0: Fixed quantization By tarininng Framework, 1: Dyanamic quantization by TIDL, Default - 1
quantizationStyle  = 1 

# quantRoundAdd/100 will be added while rounding to integer, Default - 50
quantRoundAdd      = 50

numParamBits       = 8
inElementType      = 0
inQuantFactor	   = 32512
 
inputNetFile       = "../../test/testvecs/config/tidl_models/zf/deploy.prototxt"
inputParamsFile    = "../../test/testvecs/config/tidl_models/zf/model.caffemodel"
outputNetFile      = "../../test/testvecs/output/zf/zf_net_od.bin"
outputParamsFile   = "../../test/testvecs/output/zf/zf_param_od.bin"

preProcType  = 0
rawSampleInData = 1
sampleInData = "../../test/testvecs/input/zf/zf_110.y"
tidlStatsTool = "../quantStatsTool/eve_test_dl_algo.out.exe"

layersGroupId = 0	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	2	0
 Import config file is in attachment.

Responses:
Hi, Mostly, if import tool conversion output is correct then it should run in the TIDL use case correctly. BTW, how did you check that TIDL format conversion output is correct? Thanks, Praveen

Hi Praveen, I was using default applicaton for checking output by setting parameter tidlStatsTool = "../quantStatsTool/eve_test_dl_algo.out.exe" inside config import file. Regards, Sasa

Hi Sasa, I mean to say you can use import tool output it self to check the detections in the output, for that please use markBox.c file from below e2e thread to visualize the import tool output and if you get correct detections in the output then you can use these generated bin files on VSDK usecase. Thanks, Praveen

Hi Praveen, Thanks for this. However, I was able to "read" output file and I recognized float values in it. As I have application which uses same model but different format, I know which values I should get. After comparison of results from TIDL converter and reference application I concluded that conversion is OK. Also, it is running as it should on DSP core. I just wonder is there anything specific which could make EVE core working in wrong way for this specific model? Regards, Sasa

Hi Sasa, I would suggest to run the detection output layer always on the DSP core and rest of the layers on EVE as this detection output layer uses floating point operations and so not optimized on EVE. So, always better to run this layer on DSP. Please check this configuration. Thanks, Praveen

Hi Praveen, I configured conversion as you mentioned. You can check my config import file I sent in initial question. To make sure that I did it correctly, I printed out layersGroupId parameter for every layer after network has been populated. I can see that only detection output layer has layersGroupId value of 2, which means it run on DSP core. Now when you mentioned floating point operations, since this model works well when it is run on DSP core only, can it be that input model (caffe format) is using floating point operations for all layers? Regards, Sasa

>> can it be that input model (caffe format) is using floating point operations for all layers? No, import tool converts floating model to fixed point model for inference.

Before I tested my model, I used OD model which is provided by TI to by used with original TIDL OD usecase (4 EVEs + DSP). That model worked with my usecase (1 EVE + DSP). I gave me correct coordinates of object on input image. With this I concluded that I can believe that my usecase is working as expected. Based on this fact and the fact that my model is working well when run on DSP core only, my suspicion is that problem could be somewhere in conversion process. Can you give me some explanation on parameter inQuantFactor? How its value influence convesion process? I set it to 32512 as product of 256*127. I didn't found much in documentation about this parameter so I don't know if this value is propper one to use. Regards, Sasa

Hi Sasa, Please search for inQuantFactor parameter in e2e, you will get some thread which explain this parameter. Thanks, Praveen

Hi Praveen, I searched about inQuantFactor. As far as I can see, this parameter is used when it is needed to transform input image pixel values from unsigned int values to float values (e.g. [0-255] to [0-1] values). Am I right? If so, I don't fully understand why those values are so high (32512, 65280)? I didn't find answer to this on forum threads? Maybe I went a bit of topic of this thread with those questions, I just want to find cause of problem and this parameter is new to me in conversion process. Because of correct behavior of TI OD model with my usecase I am still focusing on conversion process of my pedestrian detection model as potential problem. However, can it be that problem is potentially somewhere within usecase itself. Input image dimensions for my model are 3x300x300. Regards, Sasa

Hi Sasa, Yes, you are right, and the high values are because of scale it self is in Q8 format, so because of this the scale factor needed to be multiplied by 255. Regarding the actual problem, what is the behavior if you don't specify this inQuantFactor in the import config file. Thanks, Praveen

Hi Praveen, thank you for your answer. Regarding running without inQuantFactor: For model running on EVE+DSP - No detection in conversion process test, similar results when running on SoC (detected large bounding box covering complete image) For model running on DSP - Large bounding box when running test in conversion process or no detected objects (for different images), on SoC results OK (same as before) Regards, Sasa

Hi Sasa, Can you please confirm that import tool output it self had no detection of boxes, as the import tool runs on the PC there is no significance of EVE or DSP cores here. So, first are you able to see proper detections in the import tool output ? Thanks, Praveen

Hi Praveen, maybe I choose words badly when I wrote EVE+DSP in previous post. I just wanted to emphasize that config file is little bit different as in case when it had to be run on EVE + DSP there is additional parameter layersGroupId. But, here I hope to be more clear on this. I will concentrate on case without layersGroupId parameter. In case when I have inQuantFactor set to 32512. Detection on import tool output is: confidence: 0x3F432E00 - 0.762 x_min: 0x3EA7A6FD - 0.327 y_min: 0x3EAA0F6F - 0.332 x_max: 0x3EE2C46F - 0.442 y_max: 0x3F26130C - 0.648 When those values are converted in coordinates, approximately it is: (98, 100) and (132, 194) as top-left and bottom-right corner and it is where the object is on image. After running this model on DSP core coordinates of object are (98, 95) and (134, 192), so it similar to those during conversion process. When running on EVE core this model coordinates are (33, 14) and (254, 286). But when inQuantFactor is excluded from config file, results are different. First of all, two objects appear to be detected instead of one. Given values are: confidence: 0x3F000000 - 0.5 x_min: 0x3D84D372 - 0.065 y_min: 0x3F0C010E - 0.547 x_max: 0x3E70C97B - 0.235 y_max: 0x3F939913 - 1.153 and confidence: 0x3F000000 x_min: 0xBDC7A778 - -0.097 y_min: 0x3F340233 - 0.703 x_max: 0x3ECB8378 - 0.397 y_max: 0x3F7F3101 - 0.997 Clearly, there is value higher than 1.0 and negative value so these results are not valid. Still, I tried to run this model on DSP and it gives me same results as in conversion process. Based on this, configuration with inQuantFactor work properly and when running model on DSP it works properly, also. However, something is not OK when running this model on EVE core. Also to mention, I run cifar10 model with same usecase, with DSP and with EVE and it gave correct output and because of that I excluded possibility there is something wrong with setting EVE parameters inside usecase. Hope it's more clear now what I was trying to describe in previous post. Regards, Sasa

Hi Sasa, Thanks for the detailed explanation, could please clarify two more questions 1. When you say running EVE core means, are you running the TIDL OD usecase with layerGropupID as 1 for all the layers ? If not how ? 2. When you say running DSP core means, are you running the TIDL OD usecase with layerGropupID as 2 for all the layers ? This is needs some changes to the usecase. If not how? Thanks, Praveen

Hi Praveen, No. When I say running EVE core, I am using usecase similar to TIDL usecese used for image segmentation. Difference between my and original TIDL usecase is in a way how buffer is read in dumpOutCb function. Instead of reading width*height bytes as in original TIDL usecase, I have predefinded number of bytes expected for this model to be output of TIDL algorithm link and I just skip the padding in the buffer and read number of bytes I need. That usecase have an option whether to run TIDL on EVE or DSP, so when I say running on EVE or running on DSP it's the same usecase. I need to mention that when I converted this model, I didn't use layersGroupID parameter in config file, so all layers are executed on same core. However, when I saw that EVE core is not giving correct result and because this is OD model, I decided to create new TIDL OD usecase which utilize one EVE and one DSP core. I am using single image as input for the model, so I don't need 4 EVE cores for it. To make sure that usecase work as expected, I used TI's OD model and used single image as input and object was detected as expected (with correct coordinates of object). I used this as confirmation that my TIDL OD usecase is working as expected. Now, I added layersGroupId into my config file and converted model again to suit execution with TIDL OD usecese. I put 0 value for Data layers, value 1 for all other layers, except for Detection output layer which have value of 2. I checked those values of layerGroupId parameter for every layer once I run it on SoC and they appear correct, so only Detection output layer is executed on DSP core. This execution behaves similar to one when running TIDL usecase on EVE core only. In previous post I mentioned that for every image when running TIDL usecase, I got coordinates (33, 14) and (254, 286). For TIDL OD usecase coordinates are, for every image same also, (32, 13) and (255,286). Very similar. As far as I can see, only difference between TIDL usecase running with EVE and TIDL OD usecase is that single Detection output layer is not run on EVE in case of TIDL OD. So, similarity between results brought me to conclusion that behavior of EVE core is same in both cases and that execution on EVE core is causing troubles. When all of this is taken into account, can it be that problem is somewhere in model itself? I don't know exactly how EVE core is working, but can it be that model is not suitable for use on EVE core? Regards, Sasa

Hi Sasa, I am not sure about the usecase that you created, so let's check this issue in the original TIDL usecase. 1. Please import your model using "inQuantFactor set to 32512." and layerGroupID is 2 for last detection output layer (on DSP) and layerGroupID is 1 for rest of all the layers(on EVE) to get the net and param bin files. Use these generated net and param bin files to run TIDL OD usecase and let us know the results. 2. Also, just to confirm your findings about DSP core is working fine with your model, you can modify the TIDL use case to run using only DSP core(layersGroupID = 2 for all layers) as shown below.. UseCase: chains_capture_tidl_OD Capture -> VPE -> Dup Dup -> Merge Dup -> Alg_tidlpreproc (A15) Alg_tidlpreproc (A15) -> Alg_tidl_Dsp1 (DSP1) Alg_tidl_Dsp1 (DSP1) -> Merge Merge -> Sync -> Alg_ObjectDraw -> Display GrpxSrc -> Display_Grpx and, In AlgorithmLink_tidlProcess(), After pAlgObj->handle->ivision->algProcess() call, since DSP is expecting SYSTEM_BUFFER_TYPE_METADATA buffer type, so change the if condition check to SYSTEM_BUFFER_TYPE_VIDEO_FRAME in pAlgObj->numOutputQueues loop after algProcess(). For getting display we have to change sync delta and sync threshold to higher value Pease try this and let us know the results. Thanks, Praveen

Hi Praveen, 1. I did running as you mentioned regarding layersGroupId and results are like I wrote it in one of previous posts - (32, 13) and (255,286). Expected coordinates are around (98, 95) and (134, 192). 2. I tested with layersGroupId = 2 for all layers and I can confirm that results are the same as with usecase I created before, so I can confirm that my usecase is working OK. Regards, Sasa

Hi Sasa, Ok Thanks for the conformation. For experiment 1 above, could you please set below quant parameters and check the TIDL OD usecase ? quantHistoryParam1 = 0; //20; quantHistoryParam2 = 0; //5; quantMargin = 0; If this also does not help is it possible to share your model and prototxt with us for debugging the issue at our end? Thanks, Praveen

Hi Praveen, I tried with changes you suggested but with no success. Can you share some email address so I can send model to you? Regards, Sasa

Hi Sasa, Please share your email address so that I can ping you my email address offline. Thanks, Praveen

Hi Praveen, my email address is Sasa.Mihajlica@rt-rk.com Regards, Sasa

Hi Sasa, I have sent email for you for sharing model. I am closing this thread for now and will follow up further on this in the mail. Thanks, Praveen